python - Extract words from string to create featureset nltk -


i using nltk , nltk-trainer sentiment analysis. have accurate algorithm pickled. when follow instructions provided nltk-trainer, works well.

here works (returns desired output)

>>> words = ['some', 'words', 'in', 'a', 'sentence'] >>> feats = dict([(word, true) word in words]) >>> classifier.classify(feats) 

'feats' looks this:

out[52]: {'a': true, 'in': true, 'sentence': true, 'some': true, 'words': true} 

however, don't want type in words separated commas , apostrophes each time. have script preprocessing on text , returns string looks this.

"[['words'], ['in'], ['a'], ['sentence']]"` 

however, when try define 'feats' string, left looks this

{' ': true,  "'": true,  ',': true,  '[': true,  ']': true,  'a': true,  'b': true,  'c': true,  'e': true,  'h': true,  'i': true,  'l': true,  'n': true,  'o': true,  'p': true,  'r': true,  's': true,  'u': true} 

obviously classifier function isn't effective input. appears 'feats' definition extracting individual letters text string instead of whole words. how fix this?

i not sure understand suggest:

words = nltk.word_tokenize("some words in sentence") feats = {word: true word in words} classifier.classify(feats) 

if want use pre-processed text, try:

text = "[['words'], ['in'], ['a'], ['sentence']]" words = text[3:len(text)-3].split("'], ['") feats = {word: true word in words} classifier.classify(feats) 

Comments

Popular posts from this blog

OpenCV OpenCL: Convert Mat to Bitmap in JNI Layer for Android -

android - org.xmlpull.v1.XmlPullParserException: expected: START_TAG {http://schemas.xmlsoap.org/soap/envelope/}Envelope -

python - How to remove the Xframe Options header in django? -