python - Extract words from string to create featureset nltk -
i using nltk , nltk-trainer sentiment analysis. have accurate algorithm pickled. when follow instructions provided nltk-trainer, works well.
here works (returns desired output)
>>> words = ['some', 'words', 'in', 'a', 'sentence'] >>> feats = dict([(word, true) word in words]) >>> classifier.classify(feats)
'feats' looks this:
out[52]: {'a': true, 'in': true, 'sentence': true, 'some': true, 'words': true}
however, don't want type in words separated commas , apostrophes each time. have script preprocessing on text , returns string looks this.
"[['words'], ['in'], ['a'], ['sentence']]"`
however, when try define 'feats' string, left looks this
{' ': true, "'": true, ',': true, '[': true, ']': true, 'a': true, 'b': true, 'c': true, 'e': true, 'h': true, 'i': true, 'l': true, 'n': true, 'o': true, 'p': true, 'r': true, 's': true, 'u': true}
obviously classifier function isn't effective input. appears 'feats' definition extracting individual letters text string instead of whole words. how fix this?
i not sure understand suggest:
words = nltk.word_tokenize("some words in sentence") feats = {word: true word in words} classifier.classify(feats)
if want use pre-processed text, try:
text = "[['words'], ['in'], ['a'], ['sentence']]" words = text[3:len(text)-3].split("'], ['") feats = {word: true word in words} classifier.classify(feats)
Comments
Post a Comment