Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
500 views
in Technique[技术] by (71.8m points)

compiler construction - What is the precise definition of a lookahead set?

I'm toying around with writing compilers and learning about the theory behind syntax analysis. I've found that even though it's a key concept for understanding recognition algorithms, information about it on the net is fairly poor. It seems StackOverflow is in a unique position to fix this problem.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The lookahead sets for a grammar is defined in terms of the lookahead sets for each of its non-terminals, which in turn rely on the lookahead set for each production. Determining lookahead sets can help us determine if a grammar is LL(1), and if it is, what information we need to construct a recursive-descent parser for it.

Definition: LOOKAHEAD(X -> α) and LOOKAHEAD(X):

LOOKAHEAD(X -> α) = FIRST(α) U FOLLOW(X), if NULLABLE(α)
LOOKAHEAD(X -> α) = FIRST(α), if not NULLABLE(α)
LOOKAHEAD(X) = LOOKAHEAD(X -> α) U LOOKAHEAD(X -> β) U LOOKAHEAD(X -> γ)

where FIRST(α) is the set of terminals that α can begin with, FOLLOW(X) is the set of terminals that can come after an X anywhere in the grammar, and NULLABLE(α) is whether α can derive an empty sequence of terminals (denoted ε). The following definitions are taken from Torben Mogensen's free book, Basics of Compiler Design. See below for an example.

Definition: NULLABLE(X):

NULLABLE(ε) = true
NULLABLE(x) = false, if x is a terminal
NULLABLE(αβ) = NULLABLE(α) and NULLABLE(β)
NULLABLE(P) = NULLABLE(α_1) or NULLABLE(α_2) or ... or NULLABLE(α_n),
               if P is a non-terminal and the right-hand-sides
               of all its productions are α_1, α_2, ..., α_n.

Definition: FIRST(X):

FIRST(ε) = ?
FIRST(x) = {x}, assuming x is a terminal
FIRST(αβ) = FIRST(α) U FIRST(β), if NULLABLE(α)
          = FIRST(α), if not NULLABLE(α)
FIRST(P) = FIRST(α_1) U FIRST(α_2) U ... U FIRST(α_n),
               if P is a non-terminal and the right-hand-sides
               of all its productions are α_1, α_2, ..., α_n.

Definition: FOLLOW(X):

A terminal symbol a is in FOLLOW(X) if and only if there is a derivation from the start symbol S of the grammar such that S ? αX aβ, where α and β are (possibly empty) sequences of grammar symbols.

Intuition: FOLLOW(X):

Look at where X occurs in the grammar. All terminals that could follow it (directly or by any level of recursion) is in FOLLOW(X). Additionally, if X occurs at the end of a production (e.g. A -> foo X), or is followed by something else that can reduce to ε (e.g. A -> foo X B and B -> ε), then whatever A can be followed by, X can also be followed by (i.e. FOLLOW(A) ? FOLLOW(X)).

See the method for determining FOLLOW(X) in Torben's book and a demonstration of it just below.

An example:

E -> n A
A -> E B
A -> ε
B -> + A
B -> * A

First, NULLABLE and FIRST and are determined:

NULLABLE(E) = NULLABLE(n A) = NULLABLE(n) ∧ NULLABLE(A) = false
NULLABLE(A) = NULLABLE(E B) ∨ NULLABLE(ε) = true
NULLABLE(B) = NULLABLE(+ A) ∨ NULLABLE(* A) = false

FIRST(E) = FIRST(n A) = {n}
FIRST(A) = FIRST(E B) U FIRST(ε) = FIRST(E) U ? = {n} (because E is not NULLABLE)
FIRST(B) = FIRST(+ A) U FIRST(* A) = FIRST(+) U FIRST(*) = {+, *}

Before FOLLOW is determined, the production E' -> E $ is added, where $ is considered the "end-of-file" non-terminal. Then FOLLOW is determined:

FOLLOW(E): Let β = $, so add the constraint that FIRST($) = {$} ? FOLLOW(E)
           Let β = B, so add the constraint that FIRST(B) = {+, *} ? FOLLOW(E)
FOLLOW(A): Let β = ε, so add the constraint that FIRST(ε) = ? ? FOLLOW(A).
           Because NULLABLE(ε), add the constraint that FOLLOW(E) ? FOLLOW(A).
           Let β = ε, so add the constraint that FIRST(ε) = ? ? FOLLOW(A).
           Because NULLABLE(ε), add the constraint that FOLLOW(B) ? FOLLOW(A).
           Let β = ε, so add the constraint that FIRST(ε) = ? ? FOLLOW(A).
           Because NULLABLE(ε), add the constraint that FOLLOW(B) ? FOLLOW(A).
FOLLOW(B): Let β = ε, so add the constraint that FIRST(ε) = ? ? FOLLOW(B).
           Because NULLABLE(ε), add the constraint that FOLLOW(A) ? FOLLOW(B).

Resolving these constraints (could also be achieved by fixed-point iteration),

    {+, *, $} ? FOLLOW(E)
    FOLLOW(E) ? FOLLOW(A)
    FOLLOW(A) = FOLLOW(B)

    FOLLOW(E) = FOLLOW(A) = FOLLOW(B) = {+, *, $}.

Now LOOKAHEAD for each production can be determined:

LOOKAHEAD(E -> n A) = FIRST(n A) = {n}     because ?NULLABLE(n A)
LOOKAHEAD(A -> E B) = FIRST(E B)           because ?NULLABLE(E B)
                    = FIRST(E) = {n}       because ?NULLABLE(E)
LOOKAHEAD(A -> ε)   = FIRST(ε) U FOLLOW(A) because NULLABLE(ε)
                    = ? U {+, *, $} = {+, *, $}
LOOKAHEAD(B -> + A) = FIRST(+ A)           because ?NULLABLE(+ A)
                    = FIRST(+) = {+}       because ?NULLABLE(+)
LOOKAHEAD(B -> * A) = {*}                  for the same reason

Finally, LOOKAHEAD for each non-terminal can be determined:

LOOKAHEAD(E) = LOOKAHEAD(E -> n A) = {n}
LOOKAHEAD(A) = LOOKAHEAD(A -> E B) U LOOKAHEAD(A -> ε)   = {n} U {+, *, $}
LOOKAHEAD(B) = LOOKAHEAD(B -> + A) U LOOKAHEAD(B -> * A) = {+, *}

By this knowledge we can determine that this grammar is not LL(1) because its non-terminals have overlapping lookahead sets. (I.e., we cannot create a program that reads one symbol at a time and decides unambiguously which production to use.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...