python - Pandas read_csv dtype read all columns but few as string

Question

Welcome To Ask or Share your Answers For Others

python - Pandas read_csv dtype read all columns but few as string

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Pandas read_csv dtype read all columns but few as string

I'm using Pandas to read a bunch of CSVs. Passing an options json to dtype parameter to tell pandas which columns to read as string instead of the default:

dtype_dic= { 'service_id':str, 'end_date':str, ... }
feedArray = pd.read_csv(feedfile , dtype = dtype_dic)

In my scenario, all the columns except a few specific ones are to be read as strings. So instead of defining several columns as str in dtype_dic, I'd like to set just my chosen few as int or float. Is there a way to do that?

It's a loop cycling through various CSVs with differing columns, so a direct column conversion after having read the whole csv as string (dtype=str), would not be easy as I would not immediately know which columns that csv is having. (I'd rather spend that effort in defining all the columns in the dtype json!)

Edit: But if there's a way to process the list of column names to be converted to number without erroring out if that column isn't present in that csv, then yes that'll be a valid solution, if there's no other way to do this at csv reading stage itself.

Note: this sounds like a previously asked question but the answers there went down a very different path (bool related) which doesn't apply to this question. Pls don't mark as duplicate!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T02:47:43+0000

EDIT - sorry, I misread your question. Updated my answer.

You can read the entire csv as strings then convert your desired columns to other types afterwards like this:

df = pd.read_csv('/path/to/file.csv', dtype=str)
# example df; yours will be from pd.read_csv() above
df = pd.DataFrame({'A': ['1', '3', '5'], 'B': ['2', '4', '6'], 'C': ['x', 'y', 'z']})
types_dict = {'A': int, 'B': float}
for col, col_type in types_dict.items():
    df[col] = df[col].astype(col_type)

Another approach, if you really want to specify the proper types for all columns when reading the file in and not change them after: read in just the column names (no rows), then use those to fill in which columns should be strings

col_names = pd.read_csv('file.csv', nrows=0).columns
types_dict = {'A': int, 'B': float}
types_dict.update({col: str for col in col_names if col not in types_dict})
pd.read_csv('file.csv', dtype=types_dict)

Categories

python - Pandas read_csv dtype read all columns but few as string

python - Pandas read_csv dtype read all columns but few as string

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags