Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
453 views
in Technique[技术] by (71.8m points)

python - tf-idf feature weights using sklearn.feature_extraction.text.TfidfVectorizer

this page: http://scikit-learn.org/stable/modules/feature_extraction.html mentions:

As tf–idf is a very often used for text features, there is also another class called TfidfVectorizer that combines all the option of CountVectorizer and TfidfTransformer in a single model.

then I followed the code and use fit_transform() on my corpus. How to get the weight of each feature computed by fit_transform()?

I tried:

In [39]: vectorizer.idf_
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-39-5475eefe04c0> in <module>()
----> 1 vectorizer.idf_

AttributeError: 'TfidfVectorizer' object has no attribute 'idf_'

but this attribute is missing.

Thanks

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Since version 0.15, the tf-idf score of each feature can be retrieved via the attribute idf_ of the TfidfVectorizer object:

from sklearn.feature_extraction.text import TfidfVectorizer
corpus = ["This is very strange",
          "This is very nice"]
vectorizer = TfidfVectorizer(min_df=1)
X = vectorizer.fit_transform(corpus)
idf = vectorizer.idf_
print dict(zip(vectorizer.get_feature_names(), idf))

Output:

{u'is': 1.0,
 u'nice': 1.4054651081081644,
 u'strange': 1.4054651081081644,
 u'this': 1.0,
 u'very': 1.0}

As discussed in the comments, prior to version 0.15, a workaround is to access the attribute idf_ via the supposedly hidden _tfidf (an instance of TfidfTransformer) of the vectorizer:

idf = vectorizer._tfidf.idf_
print dict(zip(vectorizer.get_feature_names(), idf))

which should give the same output as above.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...