Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
167 views
in Technique[技术] by (71.8m points)

r - How do I apply the regression for subgroups within the population?

Let's say I have the following data frame

weight <- c(100, 137, 158, 225, 149)
age <- c(15, 18, 21, 31, 65)
gender <- c("Female, "Male, "Male", "Male", "Female")
table <- data.frame(weight, age, gender)

If I wanted to do a linear regression to see how weight predicts age, as well as examine it, I'd do:

allData <- lm(age ~ weight, data = table)
summary(allData)

What do I do if I wanted to examine how weight predicts age for females only? As in, use only the female data population to see how weight predicts age? I'm thinking something like:

FemaleData <- lm(age ~ weight, data=table (gender="Female"))
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
library(dplyr)
library(broom)

# example dataset
weight <- c(100, 137, 158, 225, 149, 148)
age <- c(15, 18, 21, 31, 65, 64)
gender <- c("Female", "Male", "Male", "Male", "Female", "Female")
table <- data.frame(weight, age, gender)

# build model for each gender value and store it in a column
table %>%
  group_by(gender) %>%                                  # for each gender value
  do(model = summary(lm(age ~ weight, data = .))) %>%   # build a model
  ungroup() -> tbl_models

# check how your new dataset looks like
tbl_models

# # A tibble: 2 x 2
#     gender            model
#   * <fctr>           <list>
#   1 Female <S3: summary.lm>
#   2   Male <S3: summary.lm>

# access / view model for Females
tbl_models %>% filter(gender == "Female") %>% pull(model)

# [[1]]
# 
# Call:
#   lm(formula = age ~ weight, data = .)
# 
# Residuals:
#   1          2          3 
# -0.0002125 -0.0101997  0.0104122 
# 
# Coefficients:
#                 Estimate Std. Error t value Pr(>|t|)    
#   (Intercept) -8.706e+01  4.943e-02   -1761 0.000361 ***
#   weight       1.021e+00  3.681e-04    2773 0.000230 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.01458 on 1 degrees of freedom
# Multiple R-squared:      1,   Adjusted R-squared:      1 
# F-statistic: 7.69e+06 on 1 and 1 DF,  p-value: 0.0002296

# build model for each gender value and store it as a tidy dataset
table %>%
  group_by(gender) %>%
  do(tidy(lm(age ~ weight, data = .))) %>%
  ungroup()

# # A tibble: 4 x 6
#   gender        term    estimate    std.error   statistic      p.value
#   <fctr>       <chr>       <dbl>        <dbl>       <dbl>        <dbl>
# 1 Female (Intercept) -87.0609860 0.0494272875 -1761.39518 0.0003614292
# 2 Female      weight   1.0206120 0.0003680516  2773.01334 0.0002295769
# 3   Male (Intercept)  -2.3370680 0.2181313917   -10.71404 0.0592475719
# 4   Male      weight   0.1480985 0.0012299556   120.40961 0.0052869963

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...