Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
481 views
in Technique[技术] by (71.8m points)

python - How to fill missing timestamps for Time column for a date in pandas

I have a time-series data as below:

print(df)

    ric     datel       timel        val
0   xyz     2017-01-01  09:00:00     2
1   xyz     2017-01-01  09:04:00     5
2   xyz     2017-01-01  09:37:00     6

Now I have to fill missing timestamps upto 09:45:00.

Expected Output:

    ric     datel       timel        val
0   xyz     2017-01-01  09:00:00     2
1   xyz     2017-01-01  09:01:00     nan
2   xyz     2017-01-01  09:02:00     nan
3   xyz     2017-01-01  09:03:00     nan
4   xyz     2017-01-01  09:04:00     5
...
...
37  xyz     2017-01-01  09:37:00      6
...
...
45  xyz     2017-01-01  09:45:00      nan

What I tried:

df1=df.resample("1 min", on ='datel').first()

which gives output as:

              ric   datel       timel     val
datel                   
2017-01-01  xyz     2017-01-01  09:00:00    2

And also tried with pd.date_range but it mostly works with datetime column. I have two different columns date and time. Is there a way to achieve this without combining date and column into datetime?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Main idea is use reindex by times created by date_range:

df['timel'] = pd.to_datetime(df['timel']).dt.time
start = pd.to_datetime(str(df['timel'].min()))
end = pd.to_datetime('09:45:00')
dates = pd.date_range(start=start, end=end, freq='1Min').time
#print (dates)

df = df.set_index('timel').reindex(dates).reset_index().reindex(columns=df.columns)
cols = df.columns.difference(['val'])
df[cols] = df[cols].ffill()
print (df.head())
   ric       datel     timel  val
0  xyz  2017-01-01  09:00:00  2.0
1  xyz  2017-01-01  09:01:00  NaN
2  xyz  2017-01-01  09:02:00  NaN
3  xyz  2017-01-01  09:03:00  NaN
4  xyz  2017-01-01  09:04:00  5.0

Similar solution with resample:

df['timel'] = pd.to_datetime(df['timel'])

#if missing row with 09:45:00 add it
if not (df['timel']  == pd.to_datetime('09:45:00')).any():
    df.loc[len(df.index), 'timel'] = pd.to_datetime('09:45:00')

df=df.set_index('timel').resample("1min").first().reset_index().reindex(columns=df.columns)
cols = df.columns.difference(['val'])
df[cols] = df[cols].ffill()
df['timel'] = df['timel'].dt.time
print (df.head())
   ric       datel     timel  val
0  xyz  2017-01-01  09:00:00  2.0
1  xyz  2017-01-01  09:01:00  NaN
2  xyz  2017-01-01  09:02:00  NaN
3  xyz  2017-01-01  09:03:00  NaN
4  xyz  2017-01-01  09:04:00  5.0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...