Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
296 views
in Technique[技术] by (71.8m points)

python - loop through rows of one csv file to find corresponding data in another

I got an interesting problem:

file1.csv has a few hundred rows like:

Code,DTime
1,2010-12-26 17:01
2,2010-12-26 17:07
2,2010-12-26 17:15

file2.csv has about 11 million rows like:

id,D,Sym,DateTime,Bid,Ask
1375022797,D,USD,2010-12-26 17:00:15,1.311400,1.311700
1375022965,D,USD,2010-12-26 17:00:56,1.311200,1.311500
1375022984,D,USD,2010-12-26 17:00:56,1.311300,1.311600
1375023013,D,USD,2010-12-26 17:01:01,1.311200,1.311500
1375023039,D,USD,2010-12-26 17:01:02,1.311100,1.311400
1375023055,D,USD,2010-12-26 17:01:03,1.311200,1.311500
1375023063,D,USD,2010-12-26 17:01:03,1.311300,1.311600

What i'm trying to do is to write a script that takes each DTime value in file1.csv and finds the first instance of a partial match in the DateTime column of file2.csv, and outputs DateTime, Bid, Ask for that row. The partial match is on the first 16 characters.

Both files are sorted from oldest to newest, so if "2010-12-26 17:01" from file1.csv matched 4 entries in file2.csv, I only need to extract the first one: "2010-12-26 17:01:01"

Not sure how to proceed.. I tried a dictionary but the order of values is important so i'm not sure if that would work. Maybe bring file1's DTime column into a list and for each entry in that list search DateTime in file2?

Thanks guys

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you don't have duplicate DTime values, this should work:

import csv

file1reader = csv.reader(open("file1.csv"), delimiter=",")
file2reader = csv.reader(open("file2.csv"), delimiter=",")

header1 = file1reader.next() #header
header2 = file2reader.next() #header

for Code, DTime in file1reader:
    for id_, D, Sym, DateTime, Bid, Ask in file2reader:
        if DateTime.startswith(DTime): # found it
            print DateTime, Bid, Ask   # output data
            break                      # break and continue where we left next time

Edit

import csv
from datetime import datetime

file1reader = csv.reader(open("file1.csv"), delimiter=",")
file2reader = csv.reader(open("file2.csv"), delimiter=",")

header1 = file1reader.next() #header
header2 = file2reader.next() #header

for Code, DTime in file1reader:
    DTime = datetime.strptime(DTime, "%Y-%m-%d %H:%M")
    for id_, D, Sym, DateTime, Bid, Ask in file2reader:
        DateTime = datetime.strptime(DateTime, "%Y-%m-%d %H:%M:%S")
        if DateTime>=DTime: # found it
            print DateTime, Bid, Ask   # output data
            break                      # break and continue where we left next time

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...