Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
397 views
in Technique[技术] by (71.8m points)

main - Specifying formula in R with glm without explicit declaration of each covariate

I would like to force specific variables into glm regressions without fully specifying each one. My real data set has ~200 variables. I haven't been able to find samples of this in my online searching thus far.

For example (with just 3 variables):

n=200
set.seed(39) 
samp = data.frame(W1 = runif(n, min = 0, max = 1), W2=runif(n, min = 0, max = 5)) 
samp = transform(samp, # add A
A = rbinom(n, 1, 1/(1+exp(-(W1^2-4*W1+1))))) 
samp = transform(samp, # add Y
Y = rbinom(n, 1,1/(1+exp(-(A-sin(W1^2)+sin(W2^2)*A+10*log(W1)*A+15*log(W2)-1+rnorm(1,mean=0,sd=.25))))))

If I want to include all main terms, this has an easy shortcut:

glm(Y~., family=binomial, data=samp)

But say I want to include all main terms (W1, W2, and A) plus W2^2:

glm(Y~A+W1+W2+I(W2^2), family=binomial, data=samp)

Is there a shortcut for this?

[editing self before publishing:] This works! glm(formula = Y ~ . + I(W2^2), family = binomial, data = samp)

Okay, so what about this one!

I want to omit one main terms variable and include only two main terms (A, W2) and W2^2 and W2^2:A:

glm(Y~A+W2+A*I(W2^2), family=binomial, data=samp)

Obviously with just a few variables no shortcut is really needed, but I work with high dimensional data. The current data set has "only" 200 variables, but some others have thousands and thousands.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your use of . creatively to build the formula containing all or almost all variables is a good and clean approach. Another option that is useful sometimes is to build the formula programatically as a string, and then convert it to formula using as.formula:

vars <- paste("Var",1:10,sep="")
fla <- paste("y ~", paste(vars, collapse="+"))
as.formula(fla)

Of course, you can make the fla object way more complicated.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...