Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
555 views
in Technique[技术] by (71.8m points)

r - Subset time series so that selected rows differs by a certain minimum time

I'm using a data.table in R to store a time series. I want to return a subset such that successive rows for the selected times are at least N seconds apart from the last row that was selected, e.g. if I have

library(data.table)
x <- data.table(t=c(0,1,3,4,5,6,7,10,16,17,18,20,21), v=1:13)
x
     t  v
 1:  0  1
 2:  1  2
 3:  3  3
 4:  4  4
 5:  5  5
 6:  6  6
 7:  7  7
 8: 10  8
 9: 16  9
10: 17 10
11: 18 11
12: 20 12
13: 21 13

and I want to sample rows that are at least 5 seconds apart, starting from the first row, then I should get a data.table with time/value pairs:

y <- x[...something...]
y
     t  v
 1:  0  1
 2:  5  5
 3: 10  8
 4: 16  9
 5: 21 13

The time samples don't have to be regularly spaced either, so I can't just take every M rows. Of course I could do this by looping through the data.table rows manually but I'm wondering if there's a more convenient way to express this using data.tables indexing.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here are a couple ways to use rolling joins to find the set of rows, w, in your subset:

t_plus = 5

# one join per row visited
w   <- c()
nxt <- 1L
while(!is.na(nxt)){ 
  w   <- c(w, nxt) 
  nxt <- x[.(t[nxt]+t_plus), on=.(t), roll=-Inf, which=TRUE]
}

# join once on all rows
w0  <- x[.(t+5), on=.(t), roll=-Inf, which=TRUE]

w   <- c()
nxt <- 1L
while (!is.na(nxt)){ 
  w   <- c(w, nxt)
  nxt <- w0[nxt] 
}

Then you can subset like x[w].


Comments

In principle, there could be other subsets that satisfy the OP's condition "at least 5 seconds apart"; this is just the one found by iterating from the first row forward.

The second way is based on @DavidArenburg's answer to the Q&A Henrik linked above. Although the question seems the same, I couldn't get that approach to work fully here.

Generally, it's a bad idea to grow things in a loop in R (like I'm doing with w here). If you're running into performance problems, that might be a good area to improve in this code.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...