Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.9k views
in Technique[技术] by (71.8m points)

pandas - How to quickly find the "real" table header of a variable width csv file by Python?

How to quickly find the "real" table header of a variable width csv file?

My target is to process a bunch of .csv files by Python and the format is like:

row0: config1, val1
row1: config2, val2
row2: misc, val_a, val_b, val_c,
row3: misc2, val_a, val_b, val_c, val_d
row4: misc3,val_a, val_b
...
rowk: configk, valk
rowk+1: header1, header2, header3, ..., headern
rowk+2: item1, item2, item3, ..., itemn
....
rowk+m: item1, item2, item3, ..., itemn
,,,,
,,,,
footer, row1
footer, row2

In the above table, the content preceding :(i.e. row1:, row2:, etc.) is my comment to help understanding and mark the rows and they are not part of the csv file.

row0-rowk are of variable length(each row has different number of columns), but from rowk+1, each row has a fixed length, until rowk+m(m rows with fixed length); then after several emtpy rows, there might be 2 or 3 footer rows with variable length.

The target is to quickly locate the header row so that I can load the table as dataframe using pandas. I tried several methods but couldnt find a satisfying one. Any suggestions are appreciated.

question from:https://stackoverflow.com/questions/65831587/how-to-quickly-find-the-real-table-header-of-a-variable-width-csv-file-by-pyth

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...