R: lm and biglm producing different answers

Question

Welcome To Ask or Share your Answers For Others

R: lm and biglm producing different answers

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

R: lm and biglm producing different answers

Why are "lm" and "biglm" producing different estimates? Consider the code below:

a = as.data.frame(cbind(y=rnorm(1000000), x1=rnorm(1000000), x2=rnorm(1000000)))
m1 = lm(y ~ x1 + x2, data=a); summary(m1)

library(biglm)
m2 = biglm(y ~ x1 + x2, data=a); summary(m2)

It makes no difference if biglm processes in chunks or not - the final estimates are different from that produced by lm.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:23:27+0000

Posting as answer simply due to length:

m2$qr

$D
[1] 1.000000e+06 1.001150e+06 9.993772e+05

$rbar
[1] -8.581350e-04 -8.116662e-04 -1.225233e-03  

$thetab
[1]  7.863159e-04 -4.276900e-04 -1.552812e-03   # these are the coefficients

Rgames: m1$coefficients
  (Intercept)            x1            x2 
 7.846869e-04 -4.295926e-04 -1.552812e-03

So, yes, the coefficients are slightly different. For example, the intercepts differ by 0.2% . Whether this sort of difference has any effect on the quality of your fitted line depends rather a lot on what you intend to do with your fit. Integration? guaranteed no problem. Extrapolation? always risky, but not because the slopes differ by 0.5% .
I would strongly recommend that at the very least you run some test cases which fit, say f(x) = g(x) +runif(N) ; h(x)= g(x) +runif(N) #runif will return different sets of RVs

,and see if lm and biglm return significantly different coefficients from the original g(x) values.

Categories

R: lm and biglm producing different answers

R: lm and biglm producing different answers

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags