Context-Free Languages

Summarized notes from Introduction to Theory of Computation, Chapter 2

describable equivalently by context-free grammars and pushdown automata
- grammar describes how to generate the language, PDA how to recognize it
can describe languages with recursive structures => particularly useful in parsers
also note: every regular language is context free

Context Free Grammars (CFG)

4-tuple $(V, Σ, R, S)$


$V$	variables (finite set)
$Σ$	terminals (finite set, disjoint from $V$ )
$R$	rules (finite set): variable and a string of variables and terminals
$S \in V$	start variable (by convention left of topmost rule)

language of the grammar is ${w \in Σ^{*} | S \overset{*}{\to} w}$
same information can be represented pictorially using a parse tree

Ambiguity

grammar does not capture precedence relations, e.g. * and +
leftmost derivation = at every step the leftmost remaining variable is the one replaced
for string $w$ and context-free grammar $G$ :
- $w$ is derived ambiguously in $G$ if it has two or more different leftmost derivations
- $G$ is ambiguous if it generates some string ambiguously
sometimes resolvable if unambiguous version of the same grammar exists
some languages are inherently ambiguous, e.g. ${a^{i} b^{j} c^{k} | i = j or j = k}$

Chomsky normal form

Any context-free language is generated by a context-free grammar in Chomsky normal form. Context-free grammar is in Chomsky normal form if every rule is of the form:

\begin{aligned} A & \to B C \\ A & \to a \end{aligned}

where a is any terminal and A, B, and C are any variables—except that B and C may not be the start variable. In addition, we permit the rule $S \to ϵ$ , where $S$ is the start variable.

Pushdown automaton (PDA)

FA + an unlimited stack: can push and pop symbols on a stack
the stack enables recognizing some non-regular languages, e.g. ${0^{n} 1^{n} | N \geq 0}$
context-free grammars and pushdown automata are equivalent in power => always possible to transform one to the other

Definition

6-tuple $(Q, Σ, Γ, δ, q_{0}, F)$ (nondeterministic)


$Q$	finite set of states
$Σ$	finite input alphabet
$Γ$	finite stack alphabet (may differ from $Σ$ )
$δ : Q \times Σ_{ϵ} \times Γ_{ϵ} \to P (Q \times Γ_{ϵ})$	transition function
$q_{0} \in Q$	start state
$F \subseteq Q$	finite set of accept states

nondeterministic transition may have several legal next moves
in a state diagram label " $a, b \to c$ " means: read $a$ from input, pop $b$ , push $c$
at each transition any processed symbol may be $ϵ$
special symbol e.g. $ maybe pushed to the stack initially, to know when stack is empty

Pumping lemma for context free languages

If $A$ is a context-free language, then there is a number $p$ (the pumping length) where, if $s$ is any string in $A$ of length at least $p$ , then $s$ may be divided into five pieces $s = u v x y z$ satisfying the conditions:

for each $i \geq 0, u v^{i} x y^{i} z \in A$
$| v y | > 0$ , and
$| v x y | \leq p$

Deterministic context-free languages (DCFLs)

deterministic and nondeterministic PDAs are not equivalent in power: $D P D A \subset N P D A$
languages that are recognizable by deterministic pushdown automata (DPDAs) are called deterministic context-free languages (DCFLs)
automaton cannot make choices about how it proceeds because only a single possibility is available at every point
the parsing problem is generally easier for DCFLs than for CFLs
deterministic context-free grammars are equal to DPDAs, provided that we restrict to endmarked languages, where all strings include a terminating symbol (every string has a forced handle) => one can always be converted to the other

Definition

6-tuple $(Q, Σ, Γ, δ, q_{0}, F)$ (deterministic)


$Q$	finite set of states
$Σ$	finite input alphabet
$Γ$	finite stack alphabet
$δ : Q \times Σ_{ϵ} \times Γ_{ϵ} \to (Q \times Γ_{ϵ}) \cup {\emptyset}$	transition function
$q_{0} \in Q$	start state
$F \subseteq Q$	finite set of accept states

The transition function $δ$ must satisfy the following condition: for every $q \in Q, a \in Σ$ , and $x \in Γ$ , exactly one of the values: $δ (q, a, x), δ (q, a, ϵ), δ (q, ϵ, x)$ , and $δ (q, ϵ, ϵ)$ is not $\emptyset$ .

$ϵ$ -moves in the DPDA’s transition function are allowed (unlike in DFAs)

may read an input symbol without popping a stack symbol ( $ϵ$ -input move), and vice versa ( $ϵ$ -stack move)
$ϵ$ -input move and $ϵ$ -stack move maybe combined: $δ (q, ϵ, ϵ)$

Acceptable moves based on stack state:

When stack is nonempty, DPDA has exactly one legal move in every situation
If the stack is empty, a DPDA can move only if the transition function specifies a move that pops $ϵ$
Otherwise the DPDA has no legal move and it rejects without reading the rest of the input