Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
262 views
in Technique[技术] by (71.8m points)

python - How to find the importance of the features for a logistic regression model?

I have a binary prediction model trained by logistic regression algorithm. I want know which features(predictors) are more important for the decision of positive or negative class. I know there is coef_ parameter comes from the scikit-learn package, but I don't know whether it is enough to for the importance. Another thing is how I can evaluate the coef_ values in terms of the importance for negative and positive classes. I also read about standardized regression coefficients and I don't know what it is.

Lets say there are features like size of tumor, weight of tumor, and etc to make a decision for a test case like malignant or not malignant. I want to know which of the features are more important for malignant and not malignant prediction. Does it make sort of sense?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

One of the simplest options to get a feeling for the "influence" of a given parameter in a linear classification model (logistic being one of those), is to consider the magnitude of its coefficient times the standard deviation of the corresponding parameter in the data.

Consider this example:

import numpy as np    
from sklearn.linear_model import LogisticRegression

x1 = np.random.randn(100)
x2 = 4*np.random.randn(100)
x3 = 0.5*np.random.randn(100)
y = (3 + x1 + x2 + x3 + 0.2*np.random.randn()) > 0
X = np.column_stack([x1, x2, x3])

m = LogisticRegression()
m.fit(X, y)

# The estimated coefficients will all be around 1:
print(m.coef_)

# Those values, however, will show that the second parameter
# is more influential
print(np.std(X, 0)*m.coef_)

An alternative way to get a similar result is to examine the coefficients of the model fit on standardized parameters:

m.fit(X / np.std(X, 0), y)
print(m.coef_)

Note that this is the most basic approach and a number of other techniques for finding feature importance or parameter influence exist (using p-values, bootstrap scores, various "discriminative indices", etc).

I am pretty sure you would get more interesting answers at https://stats.stackexchange.com/.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...