Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
859 views
in Technique[技术] by (71.8m points)

r - For each row, get column names where data is equal to a certain value

I have a data frame (df) with 7 rows and 4 columns (named c1, c2, c3, c4):

c1  c2  c3  c4
Yes No  Yes No    
Yes Yes No  No    
No  Yes No  No    
Yes No  No  No    
Yes No  Yes No    
Yes No  No  No    
No  No  Yes No

I want to add a 5th column to the data frame named Expected Result if the values on columns 1 to 4 are equal to "Yes". For example, on row 1, I have "Yes" parameters in Column 1 and Column 3. To populate Expected Result column, I would concatenate and add Column1 name and Column 2 name to the result.

Here is the full results expected:

c1, c3    
c1, c2    
c2    
c1    
c1, c3    
c1    
c3

I have the following line of code but something is not quite right:

df$Expected_Result <- colnames(df)[apply(df,1,which(LETTERS="Unfit"))]
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

We can loop (apply) through the rows (MARGIN=1) of the logical matrix (df=='Yes'), convert to 'numeric' index (which), get the names and paste it together with a wrapper toString which is paste(., collapse=', '). We may also need a if/else logical condition to check if there are any 'Yes' values in a row. If not, it should return NA.

df$Expected_Result <- apply(df=='Yes', 1, function(x) {
                       if(any(x)) {
                            toString(names(which(x))) 
                          }
                       else NA
                    })

Or another option would to get the row/column index with which by specifying the arr.ind=TRUE. Grouped by the row of 'indx' (indx[,1]), we paste the column names of 'df' ('val'). If there are some rows missing i.e. without any 'Yes' element, then use ifelse to create NA for the missing row.

 indx <- which(df=='Yes', arr.ind=TRUE)
 val <- tapply(names(df)[indx[,2]], indx[,1], FUN=toString)
 df$Expected_Result <- ifelse(seq_len(nrow(df)) %in% names(val), val, NA)

data

df <- structure(list(c1 = c("Yes", "Yes", "No", "Yes", "Yes", "Yes", 
"No"), c2 = c("No", "Yes", "Yes", "No", "No", "No", "No"), c3 = c("Yes", 
"No", "No", "No", "Yes", "No", "Yes"), c4 = c("No", "No", "No", 
"No", "No", "No", "No")), .Names = c("c1", "c2", "c3", "c4"),
class =    "data.frame", row.names = c(NA, -7L))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...