7. Context Free Languages
Jan 27, 2022
-
context free languages is a more general class of languages:
-
contains all regular languages (strict subset), and others
-
e.g. \(L = \{a^nb^n:n \geq 0\}\) and \(L = \{ww^R\}\) are context free but not regular
-
-
can be described with context free grammars or pushdown automata
-
primarily interesting for parsers
A language \(L\) is context free is there is a context free grammar \(G\) with \(L = L(G)\)
Grammars
-
grammars express languages using (a list of) production rules
-
productions replace variables with other variables or terminals
-
example: grammar for \(L = \{a^nb^n:n \geq 0\}\)
- \(S \rightarrow aSb\) | \(\epsilon\)
-
example: even length palindromes \(L(G) = \{ww^R: w \in \{a,b\}^*\}\)
- \(S \rightarrow aSb\) | \(SS\) | \(\epsilon\)
-
odd length palindromes (different language)
- \(S \rightarrow aSb\) | \(SS\) | \(a\) | \(b\) | \(\epsilon\)
-
notation hiding intermediate derivation steps \(w_1 \xrightarrow{\mathit{*}} w_n\)
-
\(\xrightarrow{\mathit{*}}\) means in zero or more derivation steps
-
trivially \(w \xrightarrow{\mathit{*}} w\)
-
Formal definition
Grammar \(G = (V, T, S, P)\) where
-
\(V\) is s set of variables
-
\(T\) is a set of terminal symbols
-
\(S\) is a start variable (\(S \in V\))
-
\(P\) is a set of productions
- all in form \(A \rightarrow s\) where \(s\) is string of variables and terminals
Language of grammar \(G\) with start variable \(S\)
-
\(L(G) = \{w: S\xrightarrow{\mathit{*}}w, w \in T^*\}\)
-
\(w\) is string of terminals of \(\epsilon\)
Derivation order and derivation trees
Example
Consider the grammar with 5 productions, and derivation of string \(aab\):
\(S \rightarrow AB\)
\(A \rightarrow aaA\) | \(\epsilon\)
\(B \rightarrow Bb\) | \(\epsilon\)
Leftmost derivation:
\(S \rightarrow AB \rightarrow aaAB \rightarrow aaB \rightarrow aaBb \rightarrow aab\)
Rightmost derivation:
\(S \rightarrow AB \rightarrow ABb \rightarrow Ab \rightarrow aaAb \rightarrow aab\)
Sometimes derivation order does not matter, both give the same derivation tree:
(Take the leaf nodes to get \(aab\))
1 2 3 4 5 6 7 |
|
Ambiguity
-
a context free grammar \(G\) is ambiguous is there is a string \(w \in L(G)\) which has two different derivation trees or two leftmost derivations
-
ambiguity may cause problems in applications using derivation trees e.g. evaluating expressions; in general compilers
-
Two ways to handle ambiguity:
-
(1) write non-ambiguous grammar (difficult)
-
(2) add rules to the parser to handle priority (practical solution)
-
-
It is not possible to write all grammars as non-ambiguous
- e.g. \(L = \{a^nb^nc^m\} \cup \{a^nb^mc^m\}, n, m \geq 0\) is inherently ambiguous
the string \(a^nb^nc^n \in L\) always has two different derivation trees for any grammar
- e.g. \(L = \{a^nb^nc^m\} \cup \{a^nb^mc^m\}, n, m \geq 0\) is inherently ambiguous
Example
\(E \rightarrow E + E\) | \(E * E\) | \((E)\) | \(a\)
(A) leftmost derivation of \(a + a * a\):
\(E \rightarrow E + E \rightarrow a + E \rightarrow a + E * E \rightarrow a + a * E \rightarrow a + a * a\)
1 2 3 4 5 6 7 |
|
(B) another leftmost derivation of \(a + a * a\):
\(E \rightarrow E * E \rightarrow E + E * E \rightarrow a + E * E \rightarrow a + a * E \rightarrow a + a * a\)
1 2 3 4 5 6 7 |
|
Bottom up evaluation for a=2: (A) gives correct result 2+2*2=6 but (B) does not 2+2*2=8.
This grammar is ambiguous because there are two different leftmost derivations.