Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
378 views
in Technique[技术] by (71.8m points)

python - Remove duplicates in a list of lists based on the third item in each sublist

I have a list of lists that looks like:

c = [['470', '4189.0', 'asdfgw', 'fds'],
     ['470', '4189.0', 'qwer', 'fds'],
     ['470', '4189.0', 'qwer', 'dsfs fdv'] 
      ...]

c has about 30,000 interior lists. What I'd like to do is eliminate duplicates based on the 4th item on each interior list. So the list of lists above would look like:

c = [['470', '4189.0', 'asdfgw', 'fds'],['470', '4189.0', 'qwer', 'dsfs fdv'] ...]

Here is what I have so far:

d = [] #list that will contain condensed c
d.append(c[0]) #append first element, so I can compare lists
for bact in c: #c is my list of lists with 30,000 interior list
    for items in d:
        if bact[3] != items[3]:
            d.append(bact)  

I think this should work, but it just runs and runs. I let it run for 30 minutes, then killed it. I don't think the program should take so long, so I'm guessing there is something wrong with my logic.

I have a feeling that creating a whole new list of lists is pretty stupid. Any help would be much appreciated, and please feel free to nitpick as I am learning. Also please correct my vocabulary if it is incorrect.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I'd do it like this:

seen = set()
cond = [x for x in c if x[3] not in seen and not seen.add(x[3])]

Explanation:

seen is a set which keeps track of already encountered fourth elements of each sublist. cond is the condensed list. In case x[3] (where x is a sublist in c) is not in seen, x will be added to cond and x[3] will be added to seen.

seen.add(x[3]) will return None, so not seen.add(x[3]) will always be True, but that part will only be evaluated if x[3] not in seen is True since Python uses short circuit evaluation. If the second condition gets evaluated, it will always return True and have the side effect of adding x[3] to seen. Here's another example of what's happening (print returns None and has the "side-effect" of printing something):

>>> False and not print('hi')
False
>>> True and not print('hi')
hi
True

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...