Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
541 views
in Technique[技术] by (71.8m points)

data mining - How to deal with multiple class ROC analysis in R (pROC package)?

When I use multiclass.roc function in R (pROC package), for instance, I trained a data set by random forest, here is my code:

# randomForest & pROC packages should be installed:
# install.packages(c('randomForest', 'pROC'))
data(iris)
library(randomForest)
library(pROC)
set.seed(1000)
# 3-class in response variable
rf = randomForest(Species~., data = iris, ntree = 100)
# predict(.., type = 'prob') returns a probability matrix
multiclass.roc(iris$Species, predict(rf, iris, type = 'prob'))

And the result is:

Call:
multiclass.roc.default(response = iris$Species, predictor = predict(rf,     
iris, type = "prob"))
Data: predict(rf, iris, type = "prob") with 3 levels of iris$Species: setosa,   
versicolor, virginica.
Multi-class area under the curve: 0.5142

Is this right? Thanks!!!

"pROC" reference: http://www.inside-r.org/packages/cran/pROC/docs/multiclass.roc

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As you saw in the reference, multiclass.roc expects a "numeric vector (...)", and the documentation of roc that is linked from there (for some reason not in the link you provided) further says "of the same length than response". You are passing a numeric matrix with 3 columns, which is clearly wrong, and isn't supported any more since pROC 1.6. I have no idea what it was doing before, probably not what you were expecting.

This means you must summarize your predictions in one single atomic vector of numeric mode. In the case of your model, you could use the following, although it generally doesn't really make sense to convert a factor into a numeric:

predictions <- as.numeric(predict(rf, iris, type = 'response'))
multiclass.roc(iris$Species, predictions)

What this code really does is to compute 3 ROC curves on your predictions (one with setosa vs. versicolor, one with versicolor vs. virginica, and one with setosa vs. virginica) and average their AUC.

Three more comments:

  • I say converting a factor to numeric doesn't make sense because you'll get different results if you don't have a perfect classification and you reorder the levels. This is why it isn't done automatically in pROC: you must think about it in your setup.
  • In general, this multiclass averaging doesn't really make sense and you're better off re-thinking your question in terms of binary classification. There are more advanced multiclass methods (with a ROC surface etc.) that aren't implemented yet in pROC
  • As was stated by @cbeleites, it is not correct to evaluate a model with its training data (resubstitution) so in a real example you must keep a test set aside or use cross-validation.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.8k users

...