Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
268 views
in Technique[技术] by (71.8m points)

python - Merge multiple csv files with same name in 10 different subdirectory

i have 10 different subdirectories with same file names in each directory ( 20 files per directory ) and column 0 is the index column in each file.

e.g

     **strong text**DIRECTORY  A
    - data_20170101_k.csv
    - data_20170102_k.csv
    - data_20170102_k.csv
    - data_20170103_k.csv
    - data_20170104_k.csv
    - data_20170105_k.csv
    .....
    .....
    - data_20170120_k.csv  



    **DIRECTORY  B**
    - data_20170101_k.csv
    - data_20170102_k.csv
    - data_20170102_k.csv
    - data_20170103_k.csv
    - data_20170104_k.csv
    - data_20170105_k.csv
    .....
    .....
    - data_20170120_k.csv                




    **DIRECTORY  C**
    - data_20170101_k.csv
    - data_20170102_k.csv
    - data_20170102_k.csv
    - data_20170103_k.csv
    - data_20170104_k.csv
    - data_20170105_k.csv
    .....
    .....
    - data_20170120_k.csv                


   Each of the above files contains 6 columns and index_col = 0  with NO
   column headers

   **DIRECTORY  FILES_MERGED**
   - data_20170101_k.csv
   - data_20170102_k.csv
   - data_20170102_k.csv
   - data_20170103_k.csv
   - data_20170104_k.csv
   - data_20170105_k.csv
   .....
   .....
   - data_20170120_k.csv

I want to merge all the files with SAME NAME from EACH subdirectory into 1 file with SAME NAME and save the new file in a NEW subdirectory e.g DIRECTORY FILES_MERGED with INDEX = Column 0. The merged file has only one index column with columns 1,2,3,4,5 from each file with same name from each directory

i have read a csv file into a pandas dataframe

   df= pd.read_csv(filename, sep=",", header = None, usecols=[0, 1, 2, 3, 4, 5])

Here is the format of dataframe

my initial original Dataframe:

             0       1        2        3        4     5
   0  1451606820  1.0862  1.08630  1.08578  1.08578  25
   1  1451608800  1.0862  1.08630  1.08578  1.08610  10
   2  1451608860  1.0862  1.08620  1.08578  1.08578  16
   3  1451610180  1.0862  1.08630  1.08578  1.08578  27
   4  1451610480  1.0858  1.08590  1.08560  1.08578  21
   5  1451610540  1.0857  1.08578  1.08570  1.08578   2
   6  1451610600  1.0857  1.08578  1.08570  1.08578   2
   7  1451610720  1.0857  1.08578  1.08570  1.08578   2
   8  1451610780  1.0857  1.08578  1.08570  1.08578   2

   Column '0' = Datetime in Epoch time 
   Columns 1,2,3,4,5 are values 
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There are many ways to do this, staying in Pandas I did the following.

With the file structure

root/  
├── dir1/  
│   ├── data_20170101_k   
│   ├── data_20170102_k    
│   ├── ...  
├── dir2/    
│   ├── data_20170101_k    
│   └── data_20170101_k  
│   └── ...   
└── ... 

This code will work, it's a little verbose for explanation but you can shorten with implementation.

import glob
import pandas as pd

CONCAT_DIR = "/FILES_CONCAT/"

# Use glob module to return all csv files under root directory. Create DF from this.
files = pd.DataFrame([file for file in glob.glob("root/*/*")], columns=["fullpath"])

#    fullpath
# 0  rootdir1data_20170101_k.csv
# 1  rootdir1data_20170102_k.csv
# 2  rootdir2data_20170101_k.csv
# 3  rootdir2data_20170102_k.csv

# Split the full path into directory and filename
files_split = files['fullpath'].str.rsplit("", 1, expand=True).rename(columns={0: 'path', 1:'filename'})

#    path       filename
# 0  rootdir1  data_20170101_k.csv
# 1  rootdir1  data_20170102_k.csv
# 2  rootdir2  data_20170101_k.csv
# 3  rootdir2  data_20170102_k.csv

# Join these into one DataFrame
files = files.join(files_split)

#    fullpath                       path        filename
# 0  rootdir1data_20170101_k.csv  rootdir1   data_20170101_k.csv
# 1  rootdir1data_20170102_k.csv  rootdir1   data_20170102_k.csv
# 2  rootdir2data_20170101_k.csv  rootdir2   data_20170101_k.csv
# 3  rootdir2data_20170102_k.csv  rootdir2   data_20170102_k.csv

# Iterate over unique filenames; read CSVs, concat DFs, save file
for f in files['filename'].unique():
    paths = files[files['filename'] == f]['fullpath'] # Get list of fullpaths from unique filenames
    dfs = [pd.read_csv(path, header=None) for path in paths] # Get list of dataframes from CSV file paths
    concat_df = pd.concat(dfs) # Concat dataframes into one
    concat_df.to_csv(CONCAT_DIR + f) # Save dataframe

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...