Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
284 views
in Technique[技术] by (71.8m points)

weighted join or match in R

I am working with election data from California's Statewide Database (https://statewidedatabase.org/election.html). I am trying to convert their precinct level election results to 2010 census block level results. I have the precinct level election results

> sov_results
# A tibble: 20,744 x 136
   COUNTY FIPS  SRPREC_KEY SRPREC ADDIST CDDIST SDDIST BEDIST TOTREG DEMREG REPREG AIPREG GRNREG LIBREG NLPREG REFREG DCLREG MSCREG TOTVOTE
    <dbl> <chr> <chr>       <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl>
 1     49 06097 060971002    1002      2      5      2      2     29      0      0      0      0      0      0      0      0      0      18
 2     49 06097 060971003    1003      2      2      2      2      1      0      0      0      0      0      0      0      0      0       0
 3     49 06097 060971005    1005      2      2      2      2    106      0      0      0      0      0      0      0      0      0      67
 4     49 06097 060971006    1006      2      5      2      2      2      0      0      0      0      0      0      0      0      0       2
 5     49 06097 060971007    1007      2      2      2      2     56      0      0      0      0      0      0      0      0      0      42
 6     49 06097 060971008    1008      2      5      2      2    148      0      0      0      0      0      0      0      0      0     109
 7     49 06097 060971009    1009      2      5      2      2    137      0      0      0      0      0      0      0      0      0      97
 8     49 06097 060971012    1012      2      5      2      2     21      0      0      0      0      0      0      0      0      0      16
 9     49 06097 060971017    1017      4      5      2      2    723      0      0      0      0      0      0      0      0      0     591
10     49 06097 060971018    1018      2      2      2      2     14      0      0      0      0      0      0      0      0      0      10
# ... with 20,734 more rows, and 117 more variables: DEMVOTE <dbl>, REPVOTE <dbl>, AIPVOTE <dbl>, GRNVOTE <dbl>, LIBVOTE <dbl>,
#   NLPVOTE <dbl>, REFVOTE <dbl>, DCLVOTE <dbl>, MSCVOTE <dbl>, PRCVOTE <dbl>, ABSVOTE <dbl>, ASSDEM01 <dbl>, ASSDEM02 <dbl>,
#   ASSDEM03 <dbl>, ASSDEM04 <dbl>, ASSDEM05 <dbl>, ASSDEM06 <dbl>, ASSDEM07 <dbl>, ASSDEM08 <dbl>, ASSGRN01 <dbl>, ASSIND01 <dbl>,
#   ASSLIB01 <dbl>, ASSPAF01 <dbl>, ASSREP01 <dbl>, ASSREP02 <dbl>, ASSREP03 <dbl>, ASSREP04 <dbl>, CNGAIP01 <dbl>, CNGDEM01 <dbl>,
#   CNGDEM02 <dbl>, CNGDEM03 <dbl>, CNGDEM04 <dbl>, CNGDEM05 <dbl>, CNGDEM06 <dbl>, CNGDEM07 <dbl>, CNGDEM08 <dbl>, CNGDEM09 <dbl>,

As well as the conversion key with the weights.

> conversion
# A tibble: 398,299 x 13
   SRPREC FIPS  ELECTION TYPE   SRPREC_KEY BLOCK_KEY        TRACT BLOCK BLKREG SRTOTREG PCTSRPREC BLKTOTREG PCTBLK
    <dbl> <chr> <chr>    <chr>  <chr>      <chr>            <dbl> <dbl>  <dbl>    <dbl>     <dbl>     <dbl>  <dbl>
 1     NA 06097 p20      sr_blk 06097nan   060970000000000      0     0      1       NA     NA            1  100  
 2   1002 06097 p20      sr_blk 060971002  060971525011014 152501  1014     26       29     89.7         26  100  
 3   1002 06097 p20      sr_blk 060971002  060971525013008 152501  3008      3       29     10.3          3  100  
 4   1003 06097 p20      sr_blk 060971003  060971526005068 152600  5068      1        1    100            1  100  
 5   1005 06097 p20      sr_blk 060971005  060971526005000 152600  5000     14      106     13.2         43   32.6
 6   1005 06097 p20      sr_blk 060971005  060971526005003 152600  5003     12      106     11.3         12  100  
 7   1005 06097 p20      sr_blk 060971005  060971526005004 152600  5004     12      106     11.3         20   60  
 8   1005 06097 p20      sr_blk 060971005  060971526005006 152600  5006      5      106      4.72         5  100  
 9   1005 06097 p20      sr_blk 060971005  060971526005008 152600  5008     24      106     22.6         24  100  
10   1005 06097 p20      sr_blk 060971005  060971526005020 152600  5020     28      106     26.4         28  100 

I want to know how to match these precinct results to the census block in such a way that the census block is given the right amount of votes from the precinct results (based on the PCTSRPREC column which indicates what percentage of of the precinct belongs in the census block).

For example, I would want to join so that 13.2% of SRPREC_KEY 060971005 is assigned to BLOCK 5000. That would be 13.2% of the TOTVOTE (rounded to a whole number), 13.2% of DEMVOTE, 13.2% of ASSDEM03 vote, etc. Is there a function or way to do this in R?

question from:https://stackoverflow.com/questions/65891290/weighted-join-or-match-in-r

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think you're looking for a join/merge operation and then a simple multiply.

library(dplyr)
select(conversion, SRPREC_KEY, BLOCK, PCTSRPREC) %>%
  left_join(., sov_results, by = "SRPREC_KEY") %>%
  mutate(across(TOTREG:TOTVOTE, ~ . * PCTSRPREC / 100))
#    SRPREC_KEY BLOCK PCTSRPREC COUNTY FIPS SRPREC ADDIST CDDIST SDDIST BEDIST  TOTREG DEMREG REPREG AIPREG GRNREG LIBREG NLPREG REFREG DCLREG MSCREG TOTVOTE
# 1    06097nan     0        NA     NA   NA     NA     NA     NA     NA     NA      NA     NA     NA     NA     NA     NA     NA     NA     NA     NA      NA
# 2   060971002  1014     89.70     49 6097   1002      2      5      2      2 26.0130      0      0      0      0      0      0      0      0      0 16.1460
# 3   060971002  3008     10.30     49 6097   1002      2      5      2      2  2.9870      0      0      0      0      0      0      0      0      0  1.8540
# 4   060971003  5068    100.00     49 6097   1003      2      2      2      2  1.0000      0      0      0      0      0      0      0      0      0  0.0000
# 5   060971005  5000     13.20     49 6097   1005      2      2      2      2 13.9920      0      0      0      0      0      0      0      0      0  8.8440
# 6   060971005  5003     11.30     49 6097   1005      2      2      2      2 11.9780      0      0      0      0      0      0      0      0      0  7.5710
# 7   060971005  5004     11.30     49 6097   1005      2      2      2      2 11.9780      0      0      0      0      0      0      0      0      0  7.5710
# 8   060971005  5006      4.72     49 6097   1005      2      2      2      2  5.0032      0      0      0      0      0      0      0      0      0  3.1624
# 9   060971005  5008     22.60     49 6097   1005      2      2      2      2 23.9560      0      0      0      0      0      0      0      0      0 15.1420
# 10  060971005  5020     26.40     49 6097   1005      2      2      2      2 27.9840      0      0      0      0      0      0      0      0      0 17.6880

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.9k users

...