Reading a text file in Python that has English and Arabic text -
i trying read text file has instagram public posted images , meta-data. each line has 1 complete post along meta-data. part of image post written in arabic. when using python read file, arabic text not show after printing line. arabic text appear etc. \xd9\x8a\xd8
this code snipped using read .txt file
test_file = codecs.open('instagram_info.txt', mode='r', encoding='utf-8') print ("reading images urls file") counter = 0 line in test_file: print("line: ", line.encode("utf-8")) counter += 1 print(counter) if counter == 50: break test_file.close()
this line example text file
100158441 25.256887893 51.507485363 centerpoint 4f09c7a6e4b090ef234993e3 http://scontent.cdninstagram.com/hphotos-xpa1/outbound-distilleryimage9/t0.0-17/obpth/9ecde7ecac7811e3b87a12bcaa646ac5_8.jpg sarrah80 25.256887893 51.507485363 2014-03-15 19:37:45 1394912265 16144 ولا راضي يوقف يم الارنوب عشان اصوره dody_nasser said "هههه اكيد خايف الجبان 😆" nassersahim said "@sarrah80 يبغي يملغ عليكم" sarrah80 said "@dody_nasser بطل ولدي بس خبرج المود ومايسوي😄" sarrah80 said "@nassersahim انت شفت الأرنب شلون يطالعه ذبحني من الضحك 😂" arwa9009 said "حياتي" fatimaaljasssim said "حياتتتتتتتنتتي عليهم فديتهم" 6 non_al3yooon,mun.mun_almalki,__manoor__,monaalalii 46
also, current code adds "b'" prefix every line being read, idea why happening ?
- python 3 naturally supports unicode. not need
codecs.open
.open
work. .encode
what's causing turn this: \xd9\x8a\xd8 . can remove function call.print("line: ", line)
Comments
Post a Comment