Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
284 views
in Technique[技术] by (71.8m points)

r - Add extra level to factors in dataframe

I have a data frame with numeric and ordered factor columns. I have lot of NA values, so no level is assigned to them. I changed NA to "No Answer", but levels of the factor columns don't contain that level, so here is how I started, but I don't know how to finish it in an elegant way:

addNoAnswer = function(df) {
   factorOrNot = sapply(df, is.factor)
   levelsList = lapply(df[, factorOrNot], levels)
   levelsList = lapply(levelsList, function(x) c(x, "No Answer"))
   ...

Is there a way to directly apply new levels to factor columns, for example, something like this:

df[, factorOrNot] = lapply(df[, factorOrNot], factor, levelsList)

Of course, this doesn't work correctly.

I want the order of levels preserved and "No Answer" level added to last place.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The levels function accept the levels(x) <- value call. Therefore, it's very easy to add different levels:

f1 <- factor(c("a", "a", NA, NA, "b", NA, "a", "c", "a", "c", "b"))
str(f1)
 Factor w/ 3 levels "a","b","c": 1 1 NA NA 2 NA 1 3 1 3 ...
levels(f1) <- c(levels(f1),"No Answer")
f1[is.na(f1)] <- "No Answer"
str(f1)
 Factor w/ 4 levels "a","b","c","No Answer": 1 1 4 4 2 4 1 3 1 3 ...

You can then loop it around all variables in a data.frame:

f1 <- factor(c("a", "a", NA, NA, "b", NA, "a", "c", "a", "c", "b"))
f2 <- factor(c("c", NA, "b", NA, "b", NA, "c" ,"a", "d", "a", "b"))
f3 <- factor(c(NA, "b", NA, "b", NA, NA, "c", NA, "d" , "e", "a"))
df1 <- data.frame(f1,n1=1:11,f2,f3)

str(df1)
  'data.frame':   11 obs. of  4 variables:
  $ f1: Factor w/ 3 levels "a","b","c": 1 1 NA NA 2 NA 1 3 1 3 ...
  $ n1: int  1 2 3 4 5 6 7 8 9 10 ...
  $ f2: Factor w/ 4 levels "a","b","c","d": 3 NA 2 NA 2 NA 3 1 4 1 ...
  $ f3: Factor w/ 5 levels "a","b","c","d",..: NA 2 NA 2 NA NA 3 NA 4 5 ...    

for(i in 1:ncol(df1)) if(is.factor(df1[,i])) levels(df1[,i]) <- c(levels(df1[,i]),"No Answer")
df1[is.na(df1)] <- "No Answer"

str(df1)
 'data.frame':   11 obs. of  4 variables:
  $ f1: Factor w/ 4 levels "a","b","c","No Answer": 1 1 4 4 2 4 1 3 1 3 ...
  $ n1: int  1 2 3 4 5 6 7 8 9 10 ...
  $ f2: Factor w/ 5 levels "a","b","c","d",..: 3 5 2 5 2 5 3 1 4 1 ...
  $ f3: Factor w/ 6 levels "a","b","c","d",..: 6 2 6 2 6 6 3 6 4 5 ...

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...