Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.6k views
in Technique[技术] by (71.8m points)

python 3.x - Readable, controllable iterators?

I'm trying to craft an LL(1) parser for a deterministic context-free grammar. One of the things I'd like to be able to use, because it would enable much simpler, less greedy and more maintainable parsing of literal records like numbers, strings, comments and quotations is k tokens of lookahead, instead of just 1 token of lookahead.

Currently, my solution (which works but which I feel is suboptimal) is like (but not) the following:

for idx, tok in enumerate(toklist):
    if tok == "blah":
       do(stuff)
    elif tok == "notblah":
        try:
            toklist[idx + 1]
        except:
            whatever()
        else:
            something(else)

(You can see my actual, much larger implementation at the link above.)

Sometimes, like if the parser finds the beginning of a string or block comment, it would be nice to "jump" the iterator's current counter, such that many indices in the iterator would be skipped.

This can in theory be done with (for example) idx += idx - toklist[idx+1:].index(COMMENT), however in practice, each time the loop repeats, the idx and obj are reinitialised with toklist.next(), overwriting any changes to the variables.

The obvious solution is a while True: or while i < len(toklist): ... i += 1, but there are a few glaring problems with those:

  • Using while on an iterator like a list is really C-like and really not Pythonic, besides the fact it's horrendously unreadable and unclear compared to an enumerate on the iterator. (Also, for while True:, which may sometimes be desirable, you have to deal with list index out of range.)

  • For each cycle of the while, there are two ways to get the current token:

    • using toklist[i] everywhere (ugly, when you could just iterate)
    • assigning toklist[i] to a shorter, more readable, less typo-vulnerable name each cycle. this has the disadvantage of hogging memory and being slow and inefficient.

Perhaps it can be argued that a while loop is what I should use, but I think while loops are for doing things until a condition is no longer true, and for loops are for iterating and looping finitely over an iterator, and a(n iterative LL) parser should clearly implement the latter.

Is there a clean, Pythonic, efficient way to control and change arbitrarily the iterator's current index?


This is not a dupe of this because all those answers use complicated, unreadable while loops, which is what I don't want.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Is there a clean, Pythonic, efficient way to control and change arbitrarily the iterator's current index?

No, there isn't. You could implement your own iterator type though; it wouldn't operate at the same speed (being implemented in Python), but it's doable. For example:

from collections.abc import Iterator

class SequenceIterator(Iterator):

    def __init__(self, seq):
        self.seq = seq
        self.idx = 0

    def __next__(self):
        try:
            ret = self.seq[self.idx]
        except IndexError:
            raise StopIteration
        else:
            self.idx += 1
            return ret

    def seek(self, offset):
        self.idx += offset

To use it, you'd do something like:

# Created outside for loop so you have name to call seek on
myseqiter = SequenceIterator(myseq)

for x in myseqiter:
    if test(x):
        # do stuff with x
    else:
        # Seek somehow, e.g.
        myseqiter.seek(1)  # Skips the next value

Adding behaviors like providing the index as well as value is left as an exercise.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...