Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
219 views
in Technique[技术] by (71.8m points)

python - Creating confusion matrix with StratifiedKFold in Random Forest

Similar question but with different dataset confusion matrix and classification report of StratifiedKFold

I'm trying to create a confusion matrix using Titanic data set. I followed this kaggle notebook and

there is this part to run StratifiedKFold with RandomForest part. I little bit modified to code to get confusion matrix. See ADDED line.

OTH, I do not understand how to split train and test data set works StratifiedKfold. Since in this notebook they already separated. I do not know where to get y_test to provide data to confusion matrix.

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
N = 2
oob = 0
probs = pd.DataFrame(np.zeros((len(X_test), N * 2)), columns=['Fold_{}_Prob_{}'.format(i, j) for i in range(1, N + 1) for j in range(2)])
importances = pd.DataFrame(np.zeros((X_train.shape[1], N)), columns=['Fold_{}'.format(i) for i in range(1, N + 1)], index=df_all.columns)


fprs, tprs, scores, conf_matrix = [], [], [], []

skf = StratifiedKFold(n_splits=N, random_state=N, shuffle=True)

for fold, (trn_idx, val_idx) in enumerate(skf.split(X_train, y_train), 1):
    print('Fold {}
'.format(fold))
    
    # Fitting the model
    single_best_model.fit(X_train[trn_idx], y_train[trn_idx])
    
    #ADDED
    y_pred = single_best_model.predict(X_test)
    
    conf_matrix.append(confusion_matrix(y_test, y_pred))
   
    
    
    # Computing Train AUC score
    trn_fpr, trn_tpr, trn_thresholds = roc_curve(y_train[trn_idx], single_best_model.predict_proba(X_train[trn_idx])[:, 1])
    trn_auc_score = auc(trn_fpr, trn_tpr)
    # Computing Validation AUC score
    val_fpr, val_tpr, val_thresholds = roc_curve(y_train[val_idx], single_best_model.predict_proba(X_train[val_idx])[:, 1])
    val_auc_score = auc(val_fpr, val_tpr)  
      
    scores.append((trn_auc_score, val_auc_score))
    fprs.append(val_fpr)
    tprs.append(val_tpr)
    
    # X_test probabilities
    probs.loc[:, 'Fold_{}_Prob_0'.format(fold)] = single_best_model.predict_proba(X_test)[:, 0]
    probs.loc[:, 'Fold_{}_Prob_1'.format(fold)] = single_best_model.predict_proba(X_test)[:, 1]
    importances.iloc[:, fold - 1] = single_best_model.feature_importances_
        
    oob += single_best_model.oob_score_ / N
    print('Fold {} OOB Score: {}
'.format(fold, single_best_model.oob_score_))   
    
print('Average OOB Score: {}'.format(oob))

The first error I'm getting is that y_test and I don't know how this guy defined y_test. I cannot find it anywhere in the notebook.

After I got the matrix I want to get something like this

https://medium.com/analytics-vidhya/evaluating-a-random-forest-model-9d165595ad56

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...