r - data.table with two string columns of set elements, extract unique rows with each row unsorted

Question

Welcome To Ask or Share your Answers For Others

r - data.table with two string columns of set elements, extract unique rows with each row unsorted

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - data.table with two string columns of set elements, extract unique rows with each row unsorted

Suppose I have a data.table like this:

Table:

V1 V2
 A  B
 C  D
 C  A
 B  A
 D  C

I want each row to be regarded as a set, which means that B A and A B are the same. So after the process, I want to get:

V1 V2
 A  B
 C  D
 C  A

In order to do that, I have to first sort the table row-by-row and then use unique to remove the duplicates. The sorting process is quite slow if I have millions of rows. So is there an easy way to remove the duplicates without sorting?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T22:34:19+0000

For just two columns you can use the following trick:

dt = data.table(a = letters[1:5], b = letters[5:1])
#   a b
#1: a e
#2: b d
#3: c c
#4: d b
#5: e a

dt[dt[, .I[1], by = list(pmin(a, b), pmax(a, b))]$V1]
#   a b
#1: a e
#2: b d
#3: c c

Categories

r - data.table with two string columns of set elements, extract unique rows with each row unsorted

r - data.table with two string columns of set elements, extract unique rows with each row unsorted

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags