Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
306 views
in Technique[技术] by (71.8m points)

r - Creating a large national dataset using Tidycensus over multiple years

I'm using tidycensus to pull dissertation data for three different years (decennial 2000, ACS 2009-2013, ACS 2015-2019) for all census tracts in the country.

Based on kyle walker's tutorial I've been able to use the map_df function to create the call below, which works. The result is a data frame that pulls data for all of the variables listed in the vector for every census tract in the country:

# get vector of state fips codes for US
us <- unique(fips_codes$state)[1:51]



# select my variables
my_vars19 <- c(pop = "B01003_001", 
               racetot = "B03002_001", 
               nhtot = "B03002_002", 
               nhwht = "B02001_002", 
               nhblk = "B02001_003", 
               nhnat = "B02001_004", 
               nhasian = "B02001_005", 
               nhpac = "B02001_006", 
               nhother = "B02001_007",
               nhtwo = "B02001_008", 
               hisp = "B03003_003",             
               male = "B01001_002",
               female = "B01001_026")



# function call to obtain tracts for US
acs2019 <- map_df(us, function(x) {
           get_acs(geography = "tract", 
                variables = my_vars19, 
                state = x)
})

glimpse(acs2019)

Rows: 949,728
Columns: 5
$ GEOID    <chr> "01001020100", "01001020100", "01001020100", "01001020100", "01001020100", "01001020100", "…
$ NAME     <chr> "Census Tract 201, Autauga County, Alabama", "Census Tract 201, Autauga County, Alabama", "…
$ variable <chr> "male", "female", "pop", "nhwht", "nhblk", "nhnat", "nhasian", "nhpac", "nhother", "nhtwo",…
$ estimate <dbl> 907, 1086, 1993, 1685, 152, 0, 2, 0, 0, 154, 1993, 1967, 26, 1058, 901, 1959, 759, 1117, 0,…
$ moe      <dbl> 118, 178, 225, 202, 78, 12, 5, 12, 12, 120, 225, 226, 36, 137, 133, 202, 113, 180, 12, 12, …

This is just a practice call though. I need to pull close to 150 to 200 variables for each year of analysis (so 2000, 2009-2013, and 2015-2019). I am worried that pulling so many variables for so many state and census tracts will be very taxing on the API. Also, I think there is a limit on the number of vars you can pull at once.

I could group calls by type of variables, but I worry breaking calls into groups could get unwieldy. And i'd also need to combine them together. I was wondering the standard practice was for creating a large dataset using tidycensus?

Do people usually break up calls or do they just call tables instead? Or is there a more efficient system than I've outlined. I know most people usually use tidycensus to pull a handful of vars, but what do they do when they need to pull a lot?

question from:https://stackoverflow.com/questions/65950551/creating-a-large-national-dataset-using-tidycensus-over-multiple-years

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...