Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.5k views
in Technique[技术] by (71.8m points)

python 3.x - pandas.errors.ParserError: ',' expected after '"'

I am trying to read this dataset from Kaggle: Amazon sales rank data for print and kindle books

The file amazon_com_extras.csv has a column named "Title" that sometimes contains a comma ',' so all the fields in this .csv are enclosed by quotation marks:

"ASIN","GROUP","FORMAT","TITLE","AUTHOR","PUBLISHER"
"022640014X","book","hardcover","The Diversity Bargain: And Other Dilemmas of Race, Admissions, and Meritocracy at Elite Universities","Natasha K. Warikoo","University Of Chicago Press"

I have read other questions related to this problem but none of them solve it. For example, I have tried:

df = pd.read_csv("amazon_com_extras.csv",engine="python",sep=',')
df = pd.read_csv("amazon_com_extras.csv",engine="python",sep=',',quotechar='"')

But nothing seems to work. I am using Python 3.7.2 and pandas 0.24.1.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is happening to you because there are fields inside the document that contain unescaped quotes inside the quoted text.

I am not aware of a way to instruct the csv parser to handle that without preprocessing.

If you don't care about those columns, you can use

pd.read_csv("amazon_com_extras.csv", engine="python", sep=',', quotechar='"', error_bad_lines=False)

That will disable the Exception from being raised, but it will remove the affected lines (you will see that in the console).

An example of such a line:

"1405246510","book","hardcover",""Hannah Montana" Annual 2010","Unknown","Egmont Books Ltd"

Notice the quotes.

Instead, a more standard dialect of csv would have rendered:

1405246510,"book","hardcover","""Hannah Montana"" Annual 2010","Unknown","Egmont Books Ltd"

You can, for example, load the file with Libreoffice and re-save it as CSV again to get a working CSV dialect or use other preprocessing techniques.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...