Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
161 views
in Technique[技术] by (71.8m points)

python - applying regex to a pandas dataframe

I'm having trouble applying a regex function a column in a python dataframe. Here is the head of my dataframe:

               Name   Season          School   G    MP  FGA  3P  3PA    3P%
 74       Joe Dumars  1982-83   McNeese State  29   NaN  487   5    8  0.625   
 84      Sam Vincent  1982-83  Michigan State  30  1066  401   5   11  0.455   
 176  Gerald Wilkins  1982-83     Chattanooga  30   820  350   0    2  0.000   
 177  Gerald Wilkins  1983-84     Chattanooga  23   737  297   3   10  0.300   
 243    Delaney Rudd  1982-83     Wake Forest  32  1004  324  13   29  0.448  

I thought I had a pretty good grasp of applying functions to Dataframes, so maybe my Regex skills are lacking.

Here is what I put together:

import re

def split_it(year):
    return re.findall('(dddd)', year)

 df['Season2'] = df['Season'].apply(split_it(x))

TypeError: expected string or buffer

Output would be a column called Season2 that contains the year before the hyphen. I'm sure theres an easier way to do it without regex, but more importantly, i'm trying to figure out what I did wrong

Thanks for any help in advance.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

When I try (a variant of) your code I get NameError: name 'x' is not defined-- which it isn't.

You could use either

df['Season2'] = df['Season'].apply(split_it)

or

df['Season2'] = df['Season'].apply(lambda x: split_it(x))

but the second one is just a longer and slower way to write the first one, so there's not much point (unless you have other arguments to handle, which we don't here.) Your function will return a list, though:

>>> df["Season"].apply(split_it)
74     [1982]
84     [1982]
176    [1982]
177    [1983]
243    [1982]
Name: Season, dtype: object

although you could easily change that. FWIW, I'd use vectorized string operations and do something like

>>> df["Season"].str[:4].astype(int)
74     1982
84     1982
176    1982
177    1983
243    1982
Name: Season, dtype: int64

or

>>> df["Season"].str.split("-").str[0].astype(int)
74     1982
84     1982
176    1982
177    1983
243    1982
Name: Season, dtype: int64

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...