Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
330 views
in Technique[技术] by (71.8m points)

data.table - Find closest Zip Code to another from two data tables in R

I have two separate data tables that I want to find the closet zip code from one data table to the other. From the SiteZip table, I want the zip codes to loop through the ConsumerZip table to obtain the shortest distance between zip codes. I used the zipcodeR package to assign lat and long to it. The two tables are below. The first table are the locations of stores. The other table are customer locations. I want to be able to create a single table that shows the closest store for each customer. I have researched for a couple days and haven't found many requests that match what I'm looking for. Ultimately I would map the store locations with a density circle by customer location. Thank you in advance for any help you can provide.

head(StoreZip)
   Zip  Store         Address1        City State   lat    lng
1: 01026 11111 151 South Street  Cummington    MA 42.48 -72.93
2: 01040 11112    303 Beech St.     Holyoke    MA 42.22 -72.64
3: 01104 11113  417 Liberty St. Springfield    MA 42.13 -72.57
4: 01104 11114    2155 Main St. Springfield    MA 42.13 -72.57
5: 01301 11115   55 Federal St.  Greenfield    MA 42.63 -72.59
6: 01301 11116     1 Arch Place  Greenfield    MA 42.63 -72.59```

```head(CustomersZip)
    Zip         ID         Address1       City    State lat    lng
1: 01001    65484654805 1548 MAIN STREET AGAWAM    MA 42.07 -72.63
2: 01001    64846124846    569 MAPLE ST  AGAWAM    MA 42.07 -72.63
3: 01001    68421548945 68 PLANTATION DR AGAWAM    MA 42.07 -72.63
4: 01001    84051545484   154 South DR   AGAWAM    MA 42.07 -72.63
5: 01001    97545154848   985 Main St    AGAWAM    MA 42.07 -72.63
6: 01002    64841515484    54 KING ST    PELHAM    MA 42.38 -72.52    
question from:https://stackoverflow.com/questions/66066717/find-closest-zip-code-to-another-from-two-data-tables-in-r

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here's one solution for mapping each CustomersZip$ID with the closest StoreZip$Store:

library(data.table)
# library(geosphere) # distHaversine
CustomersZip[
  , Store := StoreZip$Store[
      which.min(geosphere::distHaversine(
        cbind(first(lng), first(lat)),
        StoreZip[, cbind("lng", "lat"), with = FALSE])) ]
  , by = ID ]

CustomersZip
#      Zip          ID   lat    lng Store
#    <int>      <char> <num>  <num> <int>
# 1:  1001 65484654805 42.07 -72.63 11113
# 2:  1001 64846124846 42.07 -72.63 11113
# 3:  1001 68421548945 42.07 -72.63 11113
# 4:  1001 84051545484 42.07 -72.63 11113
# 5:  1001 97545154848 42.07 -72.63 11113
# 6:  1002 64841515484 42.38 -72.52 11112

Walk-through:

  • distHaversine operates on two arguments, typically matrices (or frames) with two columns each; how it calculates distances depends on the number of points in each of its arguments p1 and p2:
    • if p1 has one point, then it calculates all p2 points to the individual p1 point; similarly if p2 has one point;
    • if p1 and p2 have the same number of points, it calculates the distance point-wise, so that row1 with row1, row2 with row2, etc; it does not do a cartesian expansion of "row1 with row1,row2,row3,...", "row2 with row1,row2,row3,...", so that has to happen externally
  • because of that, we work one customer at a time, and find the store with the minimum distance (which.min) and store its Store id
  • though not a factor with this sample data, I chose to group this by CustomersZip$ID and use just the first lat/lng found for that customer; if it's possible that a customer will have different points, then remove the first calls, and group with by = seq_len(nrow(CustomersZip)) instead; if this step is unnecessary, it will still calculate the same results, the only loss will be some efficiency by calculating the same distances multiple times

Reusable data (a subset of what is in the question):

StoreZip <- fread("
  Zip Store   lat    lng
01026 11111 42.48 -72.93
01040 11112 42.22 -72.64
01104 11113 42.13 -72.57
01104 11114 42.13 -72.57
01301 11115 42.63 -72.59
01301 11116 42.63 -72.59")

CustomersZip <- fread("
  Zip      ID       lat    lng
01001 65484654805 42.07 -72.63
01001 64846124846 42.07 -72.63
01001 68421548945 42.07 -72.63
01001 84051545484 42.07 -72.63
01001 97545154848 42.07 -72.63
01002 64841515484 42.38 -72.52", colClasses = list(character="ID"))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...