Skip to content

Context-Free Languages

Summarized notes from Introduction to Theory of Computation, Chapter 2

  • describable equivalently by context-free grammars and pushdown automata
    • grammar describes how to generate the language, PDA how to recognize it
  • can describe languages with recursive structures => particularly useful in parsers
  • also note: every regular language is context free

Context Free Grammars (CFG)

4-tuple \((V, \Sigma, R, S)\)

\(V\) variables (finite set)
\(\Sigma\) terminals (finite set, disjoint from \(V\))
\(R\) rules (finite set): variable and a string of variables and terminals
\(S \in V\) start variable (by convention left of topmost rule)
  • language of the grammar is \(\{ w \in \Sigma^* | S \xrightarrow{\mathit{*}} w \}\)
  • same information can be represented pictorially using a parse tree

Ambiguity

  • grammar does not capture precedence relations, e.g. * and +
  • leftmost derivation = at every step the leftmost remaining variable is the one replaced
  • for string \(w\) and context-free grammar \(G\):

    • \(w\) is derived ambiguously in \(G\) if it has two or more different leftmost derivations
    • \(G\) is ambiguous if it generates some string ambiguously
  • sometimes resolvable if unambiguous version of the same grammar exists

  • some languages are inherently ambiguous, e.g. \(\{ a^ib^jc^k | i = j \text{ or } j = k \}\)

Chomsky normal form

Any context-free language is generated by a context-free grammar in Chomsky normal form. Context-free grammar is in Chomsky normal form if every rule is of the form:

\[\begin{align*} A & \rightarrow BC \\ A & \rightarrow a \end{align*}\]

where a is any terminal and A, B, and C are any variables—except that B and C may not be the start variable. In addition, we permit the rule \(S \rightarrow \epsilon\), where \(S\) is the start variable.

Pushdown automaton (PDA)

  • FA + an unlimited stack: can push and pop symbols on a stack
  • the stack enables recognizing some non-regular languages, e.g. \(\{0^n1^n | N \geq 0\}\)
  • context-free grammars and pushdown automata are equivalent in power => always possible to transform one to the other

Definition

6-tuple \((Q, \Sigma, \Gamma, \delta, q_0, F)\) (nondeterministic)

\(Q\) finite set of states
\(\Sigma\) finite input alphabet
\(\Gamma\) finite stack alphabet (may differ from \(\Sigma\))
\(\delta: Q \times \Sigma_{\epsilon} \times \Gamma_\epsilon \rightarrow \mathcal{P}(Q \times \Gamma_\epsilon)\) transition function
\(q_0 \in Q\) start state
\(F \subseteq Q\) finite set of accept states
  • nondeterministic transition may have several legal next moves
  • in a state diagram label "\(a, b \rightarrow c\)" means: read \(a\) from input, pop \(b\), push \(c\)
  • at each transition any processed symbol may be \(\epsilon\)
  • special symbol e.g. $ maybe pushed to the stack initially, to know when stack is empty

Pumping lemma for context free languages

If \(A\) is a context-free language, then there is a number \(p\) (the pumping length) where, if \(s\) is any string in \(A\) of length at least \(p\), then \(s\) may be divided into five pieces \(s = uvxyz\) satisfying the conditions:

  1. for each \(i \geq 0, uv^ixy^iz \in A\)
  2. \(|vy| > 0\), and
  3. \(|vxy| \leq p\)

Deterministic context-free languages (DCFLs)

  • deterministic and nondeterministic PDAs are not equivalent in power: \(DPDA \subset NPDA\)
  • languages that are recognizable by deterministic pushdown automata (DPDAs) are called deterministic context-free languages (DCFLs)
  • automaton cannot make choices about how it proceeds because only a single possibility is available at every point
  • the parsing problem is generally easier for DCFLs than for CFLs
  • deterministic context-free grammars are equal to DPDAs, provided that we restrict to endmarked languages, where all strings include a terminating symbol (every string has a forced handle) => one can always be converted to the other

Definition

6-tuple \((Q, \Sigma, \Gamma, \delta, q_0, F)\) (deterministic)

\(Q\) finite set of states
\(\Sigma\) finite input alphabet
\(\Gamma\) finite stack alphabet
\(\delta: Q \times \Sigma_{\epsilon} \times \Gamma_\epsilon \rightarrow (Q \times \Gamma_\epsilon) \cup \{ \emptyset \}\) transition function
\(q_0 \in Q\) start state
\(F \subseteq Q\) finite set of accept states

The transition function \(\delta\) must satisfy the following condition: for every \(q \in Q, a \in \Sigma\), and \(x \in \Gamma\), exactly one of the values: \(\delta(q,a,x), \delta(q,a,\epsilon), \delta(q,\epsilon,x)\), and \(\delta(q,\epsilon,\epsilon)\) is not \(\emptyset\).

\(\epsilon\)-moves in the DPDA’s transition function are allowed (unlike in DFAs)

  • may read an input symbol without popping a stack symbol (\(\epsilon\)-input move), and vice versa (\(\epsilon\)-stack move)
  • \(\epsilon\)-input move and \(\epsilon\)-stack move maybe combined: \(\delta(q, \epsilon, \epsilon)\)

Acceptable moves based on stack state:

  • When stack is nonempty, DPDA has exactly one legal move in every situation
  • If the stack is empty, a DPDA can move only if the transition function specifies a move that pops \(\epsilon\)
  • Otherwise the DPDA has no legal move and it rejects without reading the rest of the input