Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
302 views
in Technique[技术] by (71.8m points)

python - Combining 2 .csv files by common column

I have two .csv files where the first line in file 1 is:

MPID,Title,Description,Model,Category ID,Category Description,Subcategory ID,Subcategory Description,Manufacturer ID,Manufacturer Description,URL,Manufacturer (Brand) URL,Image URL,AR Price,Price,Ship Price,Stock,Condition

The first line from file 2:

Regular Price,Sale Price,Manufacturer Name,Model Number,Retailer Category,Buy URL,Product Name,Availability,Shipping Cost,Condition,MPID,Image URL,UPC,Description

and then rest of every file is filled with info.

As you can see, both files have a common field called MPID (file 1: col 1, file 2: col 9, where the first col is col 1).

I would like to create a new file which will combine these two files by looking at this column (as in: if there is an MPID that is in both files, then in the new file this MPID will appear with both its row from file 1 and its row from file 2). IF one MPID appears only in one file then it should also go into this combined file.

The files are not sorted in any way.

How do I do this on a debian machine with either a shell script or python?

Thanks.

EDIT: Both files dont have commas other than the ones separating the fields.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
sort -t , -k index1 file1 > sorted1
sort -t , -k index2 file2 > sorted2
join -t , -1 index1 -2 index2 -a 1 -a 2 sorted1 sorted2

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...