Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
184 views
in Technique[技术] by (71.8m points)

r - data.table with two string columns of set elements, extract unique rows with each row unsorted

Suppose I have a data.table like this:

Table:

V1 V2
 A  B
 C  D
 C  A
 B  A
 D  C

I want each row to be regarded as a set, which means that B A and A B are the same. So after the process, I want to get:

V1 V2
 A  B
 C  D
 C  A

In order to do that, I have to first sort the table row-by-row and then use unique to remove the duplicates. The sorting process is quite slow if I have millions of rows. So is there an easy way to remove the duplicates without sorting?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

For just two columns you can use the following trick:

dt = data.table(a = letters[1:5], b = letters[5:1])
#   a b
#1: a e
#2: b d
#3: c c
#4: d b
#5: e a

dt[dt[, .I[1], by = list(pmin(a, b), pmax(a, b))]$V1]
#   a b
#1: a e
#2: b d
#3: c c

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...