why I can't swap unicode characters in python -


why can't swap unicode characters in code?

# -*- coding: utf-8 -*-  character_swap = {'ą': 'a', 'ż': 'z', 'ó': 'o'}  text = 'idzie wąż wąską dróżką'  print text  print ''.join(character_swap.get(ch, ch) ch in text) 

output: idzie wąż wąską dróżką

expected output: idzie waz waska drozka

you need encode text first decode characters again :

>>> print ''.join(character_swap.get(ch.encode('utf8'), ch) ch in text.decode('utf8')) idzie waz waska drozka 

its because of python list comprehension doesn't encode unicode default,actually doing :

>>> [i in text] ['i', 'd', 'z', 'i', 'e', ' ', 'w', '\xc4', '\x85', '\xc5', '\xbc', ' ', 'w', '\xc4', '\x85', 's', 'k', '\xc4', '\x85', ' ', 'd', 'r', '\xc3', '\xb3', '\xc5', '\xbc', 'k', '\xc4', '\x85'] 

and character ą have :

>>> 'ą' '\xc4\x85' 

as can see within list comprehension python split in 2 part \xc4 , \x85. getting ride of can first decode text utf8 encocding :

>>> [i in text.decode('utf8')] [u'i', u'd', u'z', u'i', u'e', u' ', u'w', u'\u0105', u'\u017c', u' ', u'w', u'\u0105', u's', u'k', u'\u0105', u' ', u'd', u'r', u'\xf3', u'\u017c', u'k', u'\u0105'] 

Comments

Popular posts from this blog

OpenCV OpenCL: Convert Mat to Bitmap in JNI Layer for Android -

android - org.xmlpull.v1.XmlPullParserException: expected: START_TAG {http://schemas.xmlsoap.org/soap/envelope/}Envelope -

python - How to remove the Xframe Options header in django? -