Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
571 views
in Technique[技术] by (71.8m points)

r - How to generate a linear combination of variables and update table using data.table in a loop call?

Some toy data

set.seed(123)
df <- data.frame(what_ever = rnorm(5, 50, 1),
                 this_is = rnorm(5, 30, 1),
                 wtf_nnn = rnorm(5, 20, 1),
                 hat_ever = rnorm(5, 50, 1),
                 who_is = rnorm(5, 30, 1),
                 mmm_nnn = rnorm(5, 20, 1)                 
                 )


library(data.table)
DT <- data.table(df)

str(DT)
Classes ‘data.table’ and 'data.frame':  5 obs. of  6 variables:

How can I generate new variables in the data.table that are the result of the following using a loop?

New_Var_1 = what_ever/hat_ever
New_Var_2 = this_is/who_is
New_Var_3 = wtf_nnn/mmm_nnn

Here i order the column names

nm <- names(df)
nm1 <- nm[1:3]
nm2 <- nm[4:6]

I would like to update DT this way, and the loop throught

i <- 1

New_Var_names <- paste("New_Var_", i, sep = "")
New_Var <- sprintf("%s/%s", nm1[i], nm2[i])

Neither of the 3 attemps works.

DT[,New_Var_names := New_Var]
DT[,cat(New_Var_names) := cat(New_Var)]
DT[,eval(New_Var_names) := eval(New_Var)]
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I'd recommend to use set with a for-loop to do this, but on the current stable (CRAN) version 1.8.10, set doesn't add new columns. So, I'd do something like:

require(data.table)
out_names <- paste("newvar", 1:3, sep="_")
DT[, c(out_names) := 0]

invar1 <- names(DT)[1:3]
invar2 <- names(DT)[4:6]

for (i in seq_along(invar1)) {
    set(DT, i=NULL, j=out_names[i], value=DT[[invar1[i]]]/DT[[invar2[i]]])
}

In the current devel version (1.8.11), set can add new columns. So in that, you don't need the assignment using :=. That is:

require(data.table)
out_names <- paste("newvar", 1:3, sep="_")

invar1 <- names(DT)[1:3]
invar2 <- names(DT)[4:6]

for (i in seq_along(invar1)) {
    set(DT, i=NULL, j=out_names[i], value=DT[[invar1[i]]]/DT[[invar2[i]]])
}

For completeness, another way is :

EVAL = function(...)eval(parse(text=paste0(...)))  # helper function

New_Var_names <- paste("New_Var_", i, sep = "")
New_Var <- sprintf("%s/%s", nm1[i], nm2[i])

for (i in 1:3)
    EVAL("DT[,", New_Var_names[i], ":=", New_Var[i], "]")

This is more general in that you can also vary the operator / in the sprintf and vary the by= clause too, etc. It's similar to constructing a dynamic SQL statement, if that helps. If you wanted to log the dynamic query being executed, you could add a cat in your definition of EVAL.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...