Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
670 views
in Technique[技术] by (71.8m points)

python - How to record bad lines skipped by pandas

I'm reading a CSV file with pandas with

error_bad_lines=False

A warning is printed when a bad line is encountered. However, I want to keep a record of all the bad line numbers to feed into another program. Is there an easy way of doing that?

I thought about iterating over the file with a

chunksize=1

and catching the CParserError that ought to be thrown for each bad line encountered. When I do this though no CParserError is thrown for bad lines so I can't catch them.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Warnings are printed in the standard error channel. You can capture them to a file by redirecting the sys.stderr output.

import sys
import pandas as pd

with open('bad_lines.txt', 'w') as fp:
    sys.stderr = fp
    pd.read_csv('my_data.csv', error_bad_lines=False)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.9k users

...