python - Pandas: ignore all lines following a specific string when reading a file into a DataFrame

Question

Welcome To Ask or Share your Answers For Others

python - Pandas: ignore all lines following a specific string when reading a file into a DataFrame

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Pandas: ignore all lines following a specific string when reading a file into a DataFrame

I have a pandas DataFrame which can be summarized as this:

[Header]
Some_info = some_info
[Data]
Col1    Col2
0.532   Point
0.234   Point
0.123   Point
1.455   Square
14.64   Square
[Other data]
Other1  Other2
Test1   PASS
Test2   FAIL

My goal is to read only the portion of text between [Data] and [Other data], which is variable (different length). The header has always the same length, so skiprows from pandas.read_csv can be used. However, skipfooter needs the number of lines to skip, which can change between files.

What would be the best solution here? I would like to avoid altering the file externally unless there's no other solution.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:56:14+0000

Numpy's genfromtxt has the ability to take a generator as an input (rather than a file directly) -- the generator can just stop yielding as soon as it hits your footer. The resulting structured array could be converted to a pandas DataFrame. It's not ideal, but it didn't look like pandas' read_csv could take the generator directly.

import numpy as np
import pandas as pd

def skip_variable_footer(infile):
    for line in infile:
        if line.startswith('[Other data]'):
            raise StopIteration
        else:
            yield line


with open(filename, 'r') as infile:
    data = np.genfromtxt(skip_variable_footer(infile), delimiter=',', names=True, dtype=None)

df = pd.DataFrame(data)

Categories

python - Pandas: ignore all lines following a specific string when reading a file into a DataFrame

python - Pandas: ignore all lines following a specific string when reading a file into a DataFrame

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags