Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
428 views
in Technique[技术] by (71.8m points)

python - How to find the closest match based on 2 keys from one dataframe to another?

I have 2 dataframes I'm working with. One has a bunch of locations and coordinates (longitude, latitude). The other is a weather data set with data from weather stations all over the world and their respective coordinates. I am trying to link up the nearest weather station to each location in my data set. The weather station names and my location names are not matches.

I am left trying to link them by closest match in coordinates and have no idea where to begin.

I was thinking some use of

np.abs((location['latitude']-weather['latitude'])+(location['longitude']-weather['longitude'])

Examples of each

location...

Location   Latitude   Longitude Component  
     A  39.463744  -76.119411    Active   
     B  39.029252  -76.964251    Active   
     C  33.626946  -85.969576    Active   
     D  49.286337   10.567013    Active   
     E  37.071777  -76.360785    Active   

weather...

     Station Code             Station Name  Latitude  Longitude
     US1FLSL0019    PORT ST. LUCIE 4.0 NE   27.3237   -80.3111
     US1TXTV0133            LAKEWAY 2.8 W   30.3597   -98.0252
     USC00178998                  WALTHAM   44.6917   -68.3475
     USC00178998                  WALTHAM   44.6917   -68.3475
     USC00178998                  WALTHAM   44.6917   -68.3475

Output would be a new column on the location dataframe with the station name that is the closest match

However I am not sure how to loop thru both to accomplish this. Any help would be greatly appreciated..

Thanks, Scott

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Let's say you have a distance function dist that you want to minimize:

def dist(lat1, long1, lat2, long2):
    return np.abs((lat1-lat2)+(long1-long2))

For a given location, you can find the nearest station as follows:

lat = 39.463744
long = -76.119411
weather.apply(
    lambda row: dist(lat, long, row['Latitude'], row['Longitude']), 
    axis=1)

This will calculate the distance to all weather stations. Using idxmin you can find the closest station name:

distances = weather.apply(
    lambda row: dist(lat, long, row['Latitude'], row['Longitude']), 
    axis=1)
weather.loc[distances.idxmin(), 'StationName']

Let's put all this in a function:

def find_station(lat, long):
    distances = weather.apply(
        lambda row: dist(lat, long, row['Latitude'], row['Longitude']), 
        axis=1)
    return weather.loc[distances.idxmin(), 'StationName']

You can now get all the nearest stations by applying it to the locations dataframe:

locations.apply(
    lambda row: find_station(row['Latitude'], row['Longitude']), 
    axis=1)

Output:

0         WALTHAM
1         WALTHAM
2    PORTST.LUCIE
3         WALTHAM
4    PORTST.LUCIE

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...