Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
184 views
in Technique[技术] by (71.8m points)

r - How to identify and count phases in a time series and assign them to rows?

I am monitoring a program with R that is sometimes idle and sometimes performs tasks. My goal is to identify and distinguish runtimes of the program. The data should look like this:

time    activity    runtime
19:01   idle        0   
19:02   task1       1
19:03   task2       1
19:04   idle        0
19:05   idle        0
19:06   idle        0
19:07   task2       2
19:08   task2       2
19:09   task2       2
19:10   task1       2
19:11   idle        0
19:12   task1       3

My data so far doesn't contain the runtime, that's the column I'm interested in. I feel like there should be an easy way to do this, but I can't figure it out.

Runtimes are always separated by idle periods in between. I don't know in advance how many runtimes the program had. Additionally, it would be extremely helpful to have a method that can ignore short idle periods, so that only two (or more) consecutive idle minutes are classified as 0. So 19:11 and 19:12 would be still part of runtime 2 in that case.

So far, I just know how identify the row when it switches from one phase to another:

df %>% mutate(status = activity %in% c("task1", "task2")) %>%
  arrange(time) %>% mutate(switch = c(0, diff(status)))

table(df$switch) now gives me the number of different active phases assuming that the program started as idle in the current time frame, which I actually can't assume. So I'm not happy with the solution I've come up so far.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use rle in base R :

df$runtime <- with(rle(df$activity != 'idle'),
                   rep(cumsum(values) * values, lengths))
df

#    time activity runtime
#1  19:01     idle       0
#2  19:02    task1       1
#3  19:03    task2       1
#4  19:04     idle       0
#5  19:05     idle       0
#6  19:06     idle       0
#7  19:07    task2       2
#8  19:08    task2       2
#9  19:09    task2       2
#10 19:10    task1       2
#11 19:11     idle       0
#12 19:12    task1       3

For every group of values that is not 'idle' we increment the count by 1 to create runtime.


To combine smaller times in the same runtime you can do :

df$runtime <- with(rle(df$activity != 'idle'),
                   rep(cumsum(values & lengths > 1) * values, lengths))
df

#    time activity runtime
#1  19:01     idle       0
#2  19:02    task1       1
#3  19:03    task2       1
#4  19:04     idle       0
#5  19:05     idle       0
#6  19:06     idle       0
#7  19:07    task2       2
#8  19:08    task2       2
#9  19:09    task2       2
#10 19:10    task1       2
#11 19:11     idle       0
#12 19:12    task1       2

data

df <- structure(list(time = c("19:01", "19:02", "19:03", "19:04", "19:05", 
"19:06", "19:07", "19:08", "19:09", "19:10", "19:11", "19:12"
), activity = c("idle", "task1", "task2", "idle", "idle", "idle", 
"task2", "task2", "task2", "task1", "idle", "task1")), row.names = c(NA, 
-12L), class = "data.frame")

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...