Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

text - large-scale string matching between different dataframes python

I am trying to improve my lookup table run time.

dest_df = pd.DataFrame({"dest":["uk LHR","from ROM","City:LONDON","planetoronto"," rome rome","junk plane"]}) ## 300,000 rows
city_df_lookup=pd.DataFrame({"places":["london"," paris","toronto","rome"],
                           "code":["LHR","PAR","YTO","ROM"]}) ## around 10,000 rows 
code = city_df_lookup.code.tolist()                                                  
places = city_df_lookup.places.tolist()                                                        

def select(x):                                                                   
    for co, pl in zip(code, places):                                       
        if co in x:                                                             
            return pl                                                        

dest_df["clean_dest"] = dest_df["dest"].apply(select)  

dest_df.head()

dest               dest_match
0   uk LHR          london
1   from ROM        rome
2   City:LONDON     None
3   Planetoronto    None 
4   rome    rome    None 
5   junk plane      None

Unfortunately, the code above takes too long and i would also like the loop to try and string match between city_df_lookup.places and dest_df.dest columns

The desired output is:

dest               dest_match
0   uk LHR          london
1   from ROM        rome
2   City:LONDON     london
3   Planetoronto    tornoto
4   rome    rome    rome  
5   junk plane      No Match

I was thinking of using ahocorasick but not sure if there is a simpler method.

question from:https://stackoverflow.com/questions/65904049/large-scale-string-matching-between-different-dataframes-python

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...