Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
206 views
in Technique[技术] by (71.8m points)

r - Efficient way to filter one data frame by ranges in another

Let's say I have a data frame containing a bunch of data and a date/time column indicating when each data point was collected. I have another data frame that lists time spans, where a "Start" column indicates the date/time when each span starts and an "End" column indicates the date/time when each span ends.

I've created a dummy example below using simplified data:

main_data = data.frame(Day=c(1:30))

spans_to_filter = 
    data.frame(Span_number = c(1:6),
               Start = c(2,7,1,15,12,23),
               End = c(5,10,4,18,15,26))

I toyed around with a few ways of solving this problem and ended up with the following solution:

require(dplyr)    
filtered.main_data =
    main_data %>% 
    rowwise() %>% 
    mutate(present = any(Day >= spans_to_filter$Start & Day <= spans_to_filter$End)) %>% 
    filter(present) %>% 
    data.frame()

This works perfectly fine, but I noticed it can take a while to process if I have a lot of data (I assume because I'm performing a row-wise comparison). I'm still learning the ins-and-outs of R and I was wondering if there is a more efficient way of performing this operation, preferably using dplyr/tidyr?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In the data.table package starting from v1.9.8, non-equi joins has been implemented. With this, I've created a wrapper function inrange() for exactly these kind of operations, where the task involves finding if a point lies in any of the intervals provided, and if so return TRUE, else FALSE.

require(data.table) # v>=1.9.8
setDT(main_data)[Day %inrange% spans_to_filter[, 2:3]] # inclusive bounds
#     Day
#  1:   1
#  2:   2
#  3:   3
#  4:   4
#  5:   5
#  6:   7
#  7:   8
#  8:   9
#  9:  10
# 10:  12
# 11:  13
# 12:  14
# 13:  15
# 14:  16
# 15:  17
# 16:  18
# 17:  23
# 18:  24
# 19:  25
# 20:  26

See ?inrange for more.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...