Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

python - Clean way to convert quarterly periods to datetime in pandas

EDIT:
If you're coming to this question and your string looks like 1996-Q1, then just use pd.to_datetime(df['Quarter']) to convert it to a proper pandas datetime. This question is about solving all the quarter dates that are not in this standard format.

ORIGINAL QUESTION:
I'm looking for a nice, readable and understandable way (one that you can remember for the next time) to convert Q3 1996 to a pandas datetime, for example 1996-07-01 in this case. Until now I found this, but it's mighty ugly:

df = pd.DataFrame({'Quarter':['Q3 1996', 'Q4 1996', 'Q1 1997']})
?
df['date'] = (
    pd.to_datetime(
        df['Quarter'].str.split(' ').apply(lambda x: ''.join(x[::-1]))
))
?
print(df)
   Quarter       date
0  Q3 1996 1996-07-01
1  Q4 1996 1996-10-01
2  Q1 1997 1997-01-01

I was hoping the following would work, because it's readable, but unfortunately it doesn't:

df['date'] = pd.to_datetime(df['Quarter'], format='%q %Y')

The problem is also that quarter and year are apparently in the wrong order for pandas to do simple processing.

Can anyone help me find a cleaner way of converting Q3 1996 to a pandas datetime?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can (and should) use pd.PeriodIndex as a first step, then convert to timestamp using PeriodIndex.to_timestamp:

qs = df['Quarter'].str.replace(r'(Qd) (d+)', r'2-1')
qs

0    1996-Q3
1    1996-Q4
2    1997-Q1
Name: Quarter, dtype: object

df['date'] = pd.PeriodIndex(qs, freq='Q').to_timestamp()
df

   Quarter       date
0  Q3 1996 1996-07-01
1  Q4 1996 1996-10-01
2  Q1 1997 1997-01-01

The initial replace step is necessary as PeriodIndex expects your periods in the %Y-%q format.


Another option is to use pd.to_datetime after performing string replacement in the same way as before.

df['date'] = pd.to_datetime(
    df['Quarter'].str.replace(r'(Qd) (d+)', r'2-1'), errors='coerce')
df

   Quarter       date
0  Q3 1996 1996-07-01
1  Q4 1996 1996-10-01
2  Q1 1997 1997-01-01

If performance is important, you can split and join, but you can do it cleanly:

df['date'] = pd.to_datetime([
    '-'.join(x.split()[::-1]) for x in df['Quarter']])

df

   Quarter       date
0  Q3 1996 1996-07-01
1  Q4 1996 1996-10-01
2  Q1 1997 1997-01-01

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...