Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
600 views
in Technique[技术] by (71.8m points)

r - How to find changing points in a dataset

I need to find the points at which an increasing or decreasing trend starts and ends. In this data, a difference of ~10 between consecutive values is considered noise (i.e. not an increase or decrease). From the sample data given below, the first increasing trend would start at 317 and end at 432, and another would start at 441 and end at 983. Each of these points are to be recorded in a separate vector.

sample<- c(312,317,380,432,438,441,509,641,779,919,
           983,980,978,983,986,885,767,758,755)

Below is an image of the main change points. Can anyone suggest an R method for this?

enter image description here

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here's how to make the change point vector:

vec <- c(100312,100317,100380,100432,100438,100441,100509,100641,100779,100919,
         100983,100980,100978,100983,100986,100885,100767,100758,100755,100755)

#this finds your trend start/stops
idx <- c(cumsum(rle(abs(diff(vec))>10)$lengths)+1)

#create new vector of change points:
newVec <- vec[idx]
print(newVec)
[1] 100317 100432 100441 100983 100986 100767 100755

#(opt.) to ignore the first and last observation as a change point:
idx <- idx[which(idx!=1 & idx!=length(vec))]

#update new vector if you want the "opt." restrictions applied:
newVec <- vec[idx]
print(newVec)
[1] 100317 100432 100441 100983 100986 100767

#you can split newVec by start/stop change points like this:
start_changepoints <- newVec[c(TRUE,FALSE)]
print(start_changepoints)
[1] 100317 100441 100986

end_changepoints <- newVec[c(FALSE,TRUE)]
print(end_changepoints)
[1] 100432 100983 100767

#to count the number of events, just measure the length of start_changepoints:
length(start_changepoints)
[1] 3

If you then want to plot that, you can use this:

require(ggplot2)

#preps data for plot
df <- data.frame(vec,trends=NA,cols=NA)
df$trends[idx] <- idx
df$cols[idx] <- c("green","red")

#plot
ggplot(df, aes(x=1:NROW(df),y=vec)) +
  geom_line() +
  geom_point() +
  geom_vline(aes(xintercept=trends, col=cols), 
             lty=2, lwd=1) +
  scale_color_manual(values=na.omit(df$cols),
                     breaks=na.omit(unique(df$cols)),
                     labels=c("Start","End")) +
  xlab("Index") +
  ylab("Value") +
  guides(col=guide_legend("Trend State"))

Output:

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...