Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
243 views
in Technique[技术] by (71.8m points)

machine learning - R: "argument is of length 0" (empty plot)

I am using the R programming language. I am trying to follow this tutorial over here: https://cran.r-project.org/web/packages/lime/vignettes/Understanding_lime.html

I tried to create my own data to replicate this tutorial with:

#load libraries
library(MASS)
library(lime)
library(randomForest)

#create data
var_1<- rnorm(100,1,4)
var_2 <-rnorm(10,10,5)
var_3<- c("0","2", "4")
var_3 <- sample(var_3, 100, replace=TRUE, prob=c(0.3, 0.6, 0.1))

response<- c("1","0")
response <- sample(response, 100, replace=TRUE, prob=c(0.3, 0.7))

#put them into a data frame called "f"
f <- data.frame(var_1, var_2, var_3, response)

#declare var_3 and response_variable as factors
f$var_3 = as.factor(f$var_3)
f$response = as.factor(f$response)

# run random forest on all the data except the first observation
model<-randomForest(response ~., data = f[-1,] , mtry=2, ntree=100)
model<-as_classifier(model, labels = NULL)

#run the "lime" procedure on the first observation
explainer <- lime(f[-1,], model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(f[-1, ], explainer, n_labels = 1, n_features = 4)
    
#visualize the results - here is the error:
plot_features(explanation, ncol = 1)

Error in if (nrow(explanation) == 0) stop("No explanations to plot", call. = FALSE) : 
  argument is of length zero

Can someone please show me what I am doing wrong? Is it because this procedure is not meant to be run on a single observation?

Thanks

UPDATE: If I change this line of code:

model<-randomForest(response ~., data = f[-1,] , mtry=2, ntree=100)

to

model<-randomForest(response ~., data = f , mtry=2, ntree=100)

the code now seems to run (this is not a big problem, I can just write f = f[-1,] and f_new = f[1,] prior to running this step), but the visual plot is not fully showing up. Is this a problem with my graphics console? (note: the tutorial from the website works and runs perfectly)

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] randomForest_4.6-14 lime_0.5.1          MASS_7.3-53        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5           lubridate_1.7.9      lattice_0.20-41      class_7.3-17         assertthat_0.2.1    
 [6] glmnet_4.0-2         digest_0.6.25        ipred_0.9-9          foreach_1.5.1        mime_0.9            
[11] R6_2.4.1             plyr_1.8.6           stats4_4.0.2         ggplot2_3.3.2        pillar_1.4.6        
[16] rlang_0.4.7          caret_6.0-86         rstudioapi_0.11      data.table_1.12.8    rpart_4.1-15        
[21] Matrix_1.2-18        shinythemes_1.1.2    labeling_0.3         splines_4.0.2        gower_0.2.2         
[26] stringr_1.4.0        htmlwidgets_1.5.2    munsell_0.5.0        tinytex_0.26         shiny_1.5.0         
[31] compiler_4.0.2       httpuv_1.5.4         xfun_0.15            pkgconfig_2.0.3      shape_1.4.5         
[36] htmltools_0.5.0      nnet_7.3-14          tidyselect_1.1.0     tibble_3.0.3         prodlim_2019.11.13  
[41] codetools_0.2-16     crayon_1.3.4         dplyr_1.0.2          withr_2.3.0          later_1.1.0.1       
[46] recipes_0.1.13       ModelMetrics_1.2.2.2 grid_4.0.2           nlme_3.1-149         xtable_1.8-4        
[51] gtable_0.3.0         lifecycle_0.2.0      magrittr_1.5         pROC_1.16.2          scales_1.1.1        
[56] stringi_1.4.6        farver_2.0.3         reshape2_1.4.4       promises_1.1.1       timeDate_3043.102   
[61] ellipsis_0.3.1       generics_0.0.2       vctrs_0.3.2          xgboost_1.1.1.1      lava_1.6.8          
[66] iterators_1.0.13     tools_4.0.2          glue_1.4.1           purrr_0.3.4          fastmap_1.0.1  

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I might have got it to work. As per the original code I was using, here is the plot:

#load libraries
library(MASS)
library(lime)
library(randomForest)

#create data
var_1<- rnorm(100,1,4)
var_2 <-rnorm(10,10,5)
var_3<- c("0","2", "4")
var_3 <- sample(var_3, 100, replace=TRUE, prob=c(0.3, 0.6, 0.1))

response<- c("1","0")
response <- sample(response, 100, replace=TRUE, prob=c(0.3, 0.7))

#put them into a data frame called "f"
f <- data.frame(var_1, var_2, var_3, response)

#declare var_3 and response_variable as factors
f$var_3 = as.factor(f$var_3)
f$response = as.factor(f$response)

# run random forest on all the data except the first observation
model<-randomForest(response ~., data = f , mtry=2, ntree=100)
model<-as_classifier(model, labels = NULL)

#run the "lime" procedure on the first observation
explainer <- lime(f[-1,], model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(f[-1, ], explainer, n_labels = 1, n_features = 4)

#visualize the results - here is the error:
plot_features(explanation, ncol = 1)

enter image description here I change the code (see below):

#load libraries
library(MASS)
library(lime)
library(randomForest)

#create data
var_1<- rnorm(100,1,4)
var_2 <-rnorm(10,10,5)
var_3<- c("0","2", "4")
var_3 <- sample(var_3, 100, replace=TRUE, prob=c(0.3, 0.6, 0.1))

response<- c("1","0")
response <- sample(response, 100, replace=TRUE, prob=c(0.3, 0.7))

#put them into a data frame called "f"
f <- data.frame(var_1, var_2, var_3, response)

#declare var_3 and response_variable as factors
f$var_3 = as.factor(f$var_3)
f$response = as.factor(f$response)

# run random forest on all the data except the first observation
model<-randomForest(response ~., data = f , mtry=2, ntree=100)
model<-as_classifier(model, labels = NULL)

#run the "lime" procedure on the first observation
explainer <- lime(f[-1,], model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(f[-1, ], explainer, n_labels = 1, n_features = 4)

#visualize the results - here is the error:
plot_features(explanation, case =1:4, ncol = 1)

enter image description here

I don't understand what changed - but at least the graphics now show up. Suppose I am interested in only the first observation. I am still confused whether these lines should be:

explainer <- lime(f[-1,], model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(f[-1, ], explainer, n_labels = 1, n_features = 4)

or

explainer <- lime(f, model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(f, explainer, n_labels = 1, n_features = 4)

I am also not sure what is the difference between "probability" and "explanation fit". I assume "probability" is the probability generated by the random forest model, and "explanation fit" measures the "explanatory power" of the LIME model.

(If someone knows about this, could they please comment below? thanks)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...