Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
256 views
in Technique[技术] by (71.8m points)

dplyr - Complete missing values in time series using previous day data - using R

I have a data frame where each row is a different date and every column is different time series.
The date range in the table is 01.01.2019-01.01.2021.
Some of the time series are relevant for only part of the dates and have missing values on weekends and holidays.

How can I complete the missing values for each time series using previous day values only for the relevant dates of each column (if the time series in a specific column is from 01.03.2019 to 01.09.2019 I want to complete only the missing values in this dates range)?

I have tried to use the fill function:

data <- data %>%  
fill(colnames(data)) 

but it completes also the missing data after the specific time series is over.

For example, the df is:

#  Date         time_series_1           time_series_2
1  01-01-2019               NA                      10
2  02-01-2019               5                       NA 
3  03-01-2019               10                      NA 
4  04-01-2019               20                      6 
5  05-01-2019               30                      NA 
6  06-01-2019               NA                      8 
7  07-01-2019               7                       NA 
8  08-01-2019               5                       NA 
9  09-01-2019               NA                      NA
10 10-01-2019               NA                      NA 

The desired output is:

#  Date         time_series_1           time_series_2
1  01-01-2019               NA                      10
2  02-01-2019               5                       10 
3  03-01-2019               10                      10 
4  04-01-2019               20                      6 
5  05-01-2019               30                      6 
6  06-01-2019               30                      8 
7  07-01-2019               7                       NA 
8  08-01-2019               5                       NA 
9  09-01-2019               NA                      NA
10 10-01-2019               NA                      NA 

Thank you!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If I understand correctly, the trick is that you want to fill downward except for the bottommost NAs. And the problem with tidyr's fill is that it goes all the way down.

This isn't a fully-tidyverse solution, but for this data:

library(dplyr)
library(tidyr)
data <- tribble(
  ~Date, ~time_series_1, ~time_series_2,
  as.Date("2019-01-01"), NA, 10,
  as.Date("2019-02-01"), 5, NA,
  as.Date("2019-03-01"), 10, NA,
  as.Date("2019-04-01"), 20, 6,
  as.Date("2019-05-01"), 30, NA,
  as.Date("2019-06-01"), NA, 8,
  as.Date("2019-07-01"), 7, NA,
  as.Date("2019-08-01"), 5, NA,
  as.Date("2019-09-01"), NA, NA,
  as.Date("2019-10-01"), NA, NA
)

You can determine the ending date for each time series separately:

LastTS1Date <- with( data, max(Date[!is.na(time_series_1)])) 
LastTS2Date <- with( data, max(Date[!is.na(time_series_2)]))

And then use baseR filter syntax to only change the part of the data frame that goes up to those dates:

data[data$Date <= LastTS1Date,] <-
  data[data$Date <= LastTS1Date,] %>% fill(time_series_1)

data[data$Date <= LastTS2Date,] <-
  data[data$Date <= LastTS2Date,] %>% fill(time_series_2)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...