Reading a text file in Python that has English and Arabic text -


i trying read text file has instagram public posted images , meta-data. each line has 1 complete post along meta-data. part of image post written in arabic. when using python read file, arabic text not show after printing line. arabic text appear etc. \xd9\x8a\xd8

this code snipped using read .txt file

 test_file = codecs.open('instagram_info.txt', mode='r', encoding='utf-8')  print ("reading  images urls file")  counter = 0  line in test_file:      print("line: ", line.encode("utf-8"))      counter += 1      print(counter)      if counter == 50:      break test_file.close() 

this line example text file

100158441   25.256887893    51.507485363    centerpoint 4f09c7a6e4b090ef234993e3               http://scontent.cdninstagram.com/hphotos-xpa1/outbound-distilleryimage9/t0.0-17/obpth/9ecde7ecac7811e3b87a12bcaa646ac5_8.jpg sarrah80    25.256887893    51.507485363    2014-03-15 19:37:45 1394912265  16144       ولا راضي يوقف يم الارنوب عشان اصوره dody_nasser said "هههه اكيد خايف الجبان 😆"  nassersahim said "@sarrah80 يبغي يملغ عليكم"  sarrah80 said "@dody_nasser بطل ولدي بس خبرج المود ومايسوي😄"  sarrah80 said "@nassersahim انت شفت الأرنب شلون يطالعه ذبحني من الضحك 😂"  arwa9009 said "حياتي"  fatimaaljasssim said "حياتتتتتتتنتتي عليهم فديتهم"  6   non_al3yooon,mun.mun_almalki,__manoor__,monaalalii  46 

also, current code adds "b'" prefix every line being read, idea why happening ?

  1. python 3 naturally supports unicode. not need codecs.open. open work.
  2. .encode what's causing turn this: \xd9\x8a\xd8 . can remove function call. print("line: ", line)

Comments

Popular posts from this blog

OpenCV OpenCL: Convert Mat to Bitmap in JNI Layer for Android -

android - org.xmlpull.v1.XmlPullParserException: expected: START_TAG {http://schemas.xmlsoap.org/soap/envelope/}Envelope -

python - How to remove the Xframe Options header in django? -