Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.3k views
in Technique[技术] by (71.8m points)

r - How does the subset argument work in the lm() function?

I have been trying to figure out how the subset argument in R's lm() function works. Especially the follwoing code seems dubious for me:

 data(mtcars)
 summary(lm(mpg ~ wt,  data=mtcars))
 summary(lm(mpg ~ wt, cyl, data=mtcars))

In every case the regression has 32 observations

  dim(lm(mpg ~ wt, cyl  ,data=mtcars)$model)
  [1] 32  2
   dim(lm(mpg ~ wt  ,data=mtcars)$model)
  [1] 32  2

yet the coefficients change (along with the R2). The help doesn't provide too much information on this matter:

subset an optional vector specifying a subset of observations to be used in the fitting process

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As a general principle, vectors used in subsetting can either logical (e.g. a TRUE or FALSE for every element) or numeric (e.g. a number). As a feature to help with sampling, if it is numeric R will include the same element multiple times if it appears in a subsetting numeric vector.

Let's take a look at cyl:

> mtcars$cyl
 [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

So you're getting a data.frame of the same length, but it's comprised of row 6, row 6, row 4, row 6, etc.

You can see this if you do the subsetting yourself:

> head(mtcars[mtcars$cyl,])
                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Valiant.1      18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Valiant.2      18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Valiant.3      18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1

Did you mean to do something like this?

summary(lm(mpg ~ wt, cyl==6, data=mtcars))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...