duplicate data - R, find duplicated rows , regardless of order

Question

Welcome To Ask or Share your Answers For Others

duplicate data - R, find duplicated rows , regardless of order

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

duplicate data - R, find duplicated rows , regardless of order

I've been thinking this problem for a whole night: here is my matrix:

'a' '#' 3
'#' 'a' 3
 0  'I am' 2
'I am' 0 2

.....

I want to treat the rows like the first two rows are the same, because it's just different order of 'a' and '#'. In my case, I want to delete such kind of rows. The toy example is simple, the first two are the same, the third and the forth are the same. but in my data set, I don't know where is the 'same' row.

I'm writing in R. Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T01:33:40+0000

Perhaps something like this would work for you. It is not clear what your desired output is though.

x <- structure(c("a", "#", "0", "I am", "#", "a", "I am", "0", "3", 
                 "3", "2", "2"), .Dim = c(4L, 3L))
x
#      [,1]   [,2]   [,3]
# [1,] "a"    "#"    "3" 
# [2,] "#"    "a"    "3" 
# [3,] "0"    "I am" "2" 
# [4,] "I am" "0"    "2" 


duplicated(
  lapply(1:nrow(x), function(y){
    A <- x[y, ]
    A[order(A)]
  }))
# [1] FALSE  TRUE FALSE  TRUE

This basically splits the matrix up by row, then sorts each row. duplicated works on lists too, so you just wrap the whole thing with `duplicated to find which items (rows) are duplicated.

Categories

duplicate data - R, find duplicated rows , regardless of order

duplicate data - R, find duplicated rows , regardless of order

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags