Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
828 views
in Technique[技术] by (71.8m points)

python - How to turn an itertools "grouper" object into a list

I am trying to learn how to use itertools.groupby in Python and I wanted to find the size of each group of characters. At first I tried to see if I could find the length of a single group:

from itertools import groupby
len(list(list( groupby("cccccaaaaatttttsssssss") )[0][1]))

and I would get 0 every time.

I did a little research and found out that other people were doing it this way:

from itertools import groupby
for key,grouper in groupby("cccccaaaaatttttsssssss"):
    print key,len(list(grouper))

Which works great. What I am confused about is why does the latter code work, but the former does not? If I wanted to get only the nth group like I was trying to do in my original code, how would I do that?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The reason that your first approach doesn't work is that the the groups get "consumed" when you create that list with

list(groupby("cccccaaaaatttttsssssss"))

To quote from the groupby docs

The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible.

Let's break it down into stages.

from itertools import groupby

a = list(groupby("cccccaaaaatttttsssssss"))
print(a)
b = a[0][1]
print(b)
print('So far, so good')
print(list(b))
print('What?!')

output

[('c', <itertools._grouper object at 0xb715104c>), ('a', <itertools._grouper object at 0xb715108c>), ('t', <itertools._grouper object at 0xb71510cc>), ('s', <itertools._grouper object at 0xb715110c>)]
<itertools._grouper object at 0xb715104c>
So far, so good
[]
What?!

Our itertools._grouper object at 0xb715104c is empty because it shares its contents with the "parent" iterator returned by groupby, and those items are now gone because that first list call iterated over the parent.

It's really no different to what happens if you try to iterate twice over any iterator, eg a simple generator expression.

g = (c for c in 'python')
print(list(g))
print(list(g))

output

['p', 'y', 't', 'h', 'o', 'n']
[]

BTW, here's another way to get the length of a groupby group if you don't actually need its contents; it's a little cheaper (and uses less RAM) than building a list just to find its length.

from itertools import groupby

for k, g in groupby("cccccaaaaatttttsssssss"):
    print(k, sum(1 for _ in g))

output

c 5
a 5
t 5
s 7

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...