Context-Free Languages
Summarized notes from Introduction to Theory of Computation, Chapter 2
- describable equivalently by context-free grammars and pushdown automata
- grammar describes how to generate the language, PDA how to recognize it
- can describe languages with recursive structures => particularly useful in parsers
- also note: every regular language is context free
Context Free Grammars (CFG)
4-tuple \((V, \Sigma, R, S)\)
\(V\) | variables (finite set) |
\(\Sigma\) | terminals (finite set, disjoint from \(V\)) |
\(R\) | rules (finite set): variable and a string of variables and terminals |
\(S \in V\) | start variable (by convention left of topmost rule) |
- language of the grammar is \(\{ w \in \Sigma^* | S \xrightarrow{\mathit{*}} w \}\)
- same information can be represented pictorially using a parse tree
Ambiguity
- grammar does not capture precedence relations, e.g. * and +
- leftmost derivation = at every step the leftmost remaining variable is the one replaced
-
for string \(w\) and context-free grammar \(G\):
- \(w\) is derived ambiguously in \(G\) if it has two or more different leftmost derivations
- \(G\) is ambiguous if it generates some string ambiguously
-
sometimes resolvable if unambiguous version of the same grammar exists
- some languages are inherently ambiguous, e.g. \(\{ a^ib^jc^k | i = j \text{ or } j = k \}\)
Chomsky normal form
Any context-free language is generated by a context-free grammar in Chomsky normal form. Context-free grammar is in Chomsky normal form if every rule is of the form:
where a is any terminal and A, B, and C are any variables—except that B and C may not be the start variable. In addition, we permit the rule \(S \rightarrow \epsilon\), where \(S\) is the start variable.
Pushdown automaton (PDA)
- FA + an unlimited stack: can push and pop symbols on a stack
- the stack enables recognizing some non-regular languages, e.g. \(\{0^n1^n | N \geq 0\}\)
- context-free grammars and pushdown automata are equivalent in power => always possible to transform one to the other
Definition
6-tuple \((Q, \Sigma, \Gamma, \delta, q_0, F)\) (nondeterministic)
\(Q\) | finite set of states |
\(\Sigma\) | finite input alphabet |
\(\Gamma\) | finite stack alphabet (may differ from \(\Sigma\)) |
\(\delta: Q \times \Sigma_{\epsilon} \times \Gamma_\epsilon \rightarrow \mathcal{P}(Q \times \Gamma_\epsilon)\) | transition function |
\(q_0 \in Q\) | start state |
\(F \subseteq Q\) | finite set of accept states |
- nondeterministic transition may have several legal next moves
- in a state diagram label "\(a, b \rightarrow c\)" means: read \(a\) from input, pop \(b\), push \(c\)
- at each transition any processed symbol may be \(\epsilon\)
- special symbol e.g. $ maybe pushed to the stack initially, to know when stack is empty
Pumping lemma for context free languages
If \(A\) is a context-free language, then there is a number \(p\) (the pumping length) where, if \(s\) is any string in \(A\) of length at least \(p\), then \(s\) may be divided into five pieces \(s = uvxyz\) satisfying the conditions:
- for each \(i \geq 0, uv^ixy^iz \in A\)
- \(|vy| > 0\), and
- \(|vxy| \leq p\)
Deterministic context-free languages (DCFLs)
- deterministic and nondeterministic PDAs are not equivalent in power: \(DPDA \subset NPDA\)
- languages that are recognizable by deterministic pushdown automata (DPDAs) are called deterministic context-free languages (DCFLs)
- automaton cannot make choices about how it proceeds because only a single possibility is available at every point
- the parsing problem is generally easier for DCFLs than for CFLs
- deterministic context-free grammars are equal to DPDAs, provided that we restrict to endmarked languages, where all strings include a terminating symbol (every string has a forced handle) => one can always be converted to the other
Definition
6-tuple \((Q, \Sigma, \Gamma, \delta, q_0, F)\) (deterministic)
\(Q\) | finite set of states |
\(\Sigma\) | finite input alphabet |
\(\Gamma\) | finite stack alphabet |
\(\delta: Q \times \Sigma_{\epsilon} \times \Gamma_\epsilon \rightarrow (Q \times \Gamma_\epsilon) \cup \{ \emptyset \}\) | transition function |
\(q_0 \in Q\) | start state |
\(F \subseteq Q\) | finite set of accept states |
The transition function \(\delta\) must satisfy the following condition: for every \(q \in Q, a \in \Sigma\), and \(x \in \Gamma\), exactly one of the values: \(\delta(q,a,x), \delta(q,a,\epsilon), \delta(q,\epsilon,x)\), and \(\delta(q,\epsilon,\epsilon)\) is not \(\emptyset\).
\(\epsilon\)-moves in the DPDA’s transition function are allowed (unlike in DFAs)
- may read an input symbol without popping a stack symbol (\(\epsilon\)-input move), and vice versa (\(\epsilon\)-stack move)
- \(\epsilon\)-input move and \(\epsilon\)-stack move maybe combined: \(\delta(q, \epsilon, \epsilon)\)
Acceptable moves based on stack state:
- When stack is nonempty, DPDA has exactly one legal move in every situation
- If the stack is empty, a DPDA can move only if the transition function specifies a move that pops \(\epsilon\)
- Otherwise the DPDA has no legal move and it rejects without reading the rest of the input