Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
754 views
in Technique[技术] by (71.8m points)

r - Remove consecutive duplicates from dataframe

I have a data frame that I want to remove duplicates that are consecutive (in base). I know rle may be helpful here but can't think of how to use it. The example output will help to illuminate what I'm asking for.

Generate sample data:

set.seed(12)
samps <- sample(1:5, 20, T)
dat <- data.frame(v1=LETTERS[samps], v2=month.abb[samps])
dat[10, 2] <- "Mar"

Sample data:

   v1  v2
1   A Jan
2   E May
3   E May
4   B Feb
5   A Jan
6   A Jan
7   A Jan
8   D Apr
9   A Jan
10  A Mar
11  B Feb
12  E May
13  B Feb
14  B Feb
15  B Feb
16  C Mar
17  C Mar
18  C Mar
19  D Apr
20  A Jan

Desired outcome:

   v1  v2
1   A Jan
3   E May
4   B Feb
7   A Jan
8   D Apr
10  A Mar
11  B Feb
12  E May
15  B Feb
18  C Mar
19  D Apr
20  A Jan
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here's a way, not with rle, but a way none-the-less:

dat[with(dat, c(TRUE, diff(as.numeric(interaction(v1, v2))) != 0)), ]

This assumes you're using factor columns, as your sample data implies.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...