Context-Free Languages
Summarized notes from Introduction to Theory of Computation, Chapter 2
- describable equivalently by context-free grammars and pushdown automata
- grammar describes how to generate the language, PDA how to recognize it
- can describe languages with recursive structures => particularly useful in parsers
- also note: every regular language is context free
Context Free Grammars (CFG)
4-tuple
variables (finite set) | |
terminals (finite set, disjoint from |
|
rules (finite set): variable and a string of variables and terminals | |
start variable (by convention left of topmost rule) |
- language of the grammar is
- same information can be represented pictorially using a parse tree
Ambiguity
- grammar does not capture precedence relations, e.g. * and +
- leftmost derivation = at every step the leftmost remaining variable is the one replaced
-
for string
and context-free grammar : is derived ambiguously in if it has two or more different leftmost derivations is ambiguous if it generates some string ambiguously
-
sometimes resolvable if unambiguous version of the same grammar exists
- some languages are inherently ambiguous, e.g.
Chomsky normal form
Any context-free language is generated by a context-free grammar in Chomsky normal form. Context-free grammar is in Chomsky normal form if every rule is of the form:
where a is any terminal and A, B, and C are any variables—except that B and C may not be the start variable.
In addition, we permit the rule
Pushdown automaton (PDA)
- FA + an unlimited stack: can push and pop symbols on a stack
- the stack enables recognizing some non-regular languages, e.g.
- context-free grammars and pushdown automata are equivalent in power => always possible to transform one to the other
Definition
6-tuple
finite set of states | |
finite input alphabet | |
finite stack alphabet (may differ from |
|
transition function | |
start state | |
finite set of accept states |
- nondeterministic transition may have several legal next moves
- in a state diagram label "
" means: read from input, pop , push - at each transition any processed symbol may be
- special symbol e.g. $ maybe pushed to the stack initially, to know when stack is empty
Pumping lemma for context free languages
If
- for each
, and
Deterministic context-free languages (DCFLs)
- deterministic and nondeterministic PDAs are not equivalent in power:
- languages that are recognizable by deterministic pushdown automata (DPDAs) are called deterministic context-free languages (DCFLs)
- automaton cannot make choices about how it proceeds because only a single possibility is available at every point
- the parsing problem is generally easier for DCFLs than for CFLs
- deterministic context-free grammars are equal to DPDAs, provided that we restrict to endmarked languages, where all strings include a terminating symbol (every string has a forced handle) => one can always be converted to the other
Definition
6-tuple
finite set of states | |
finite input alphabet | |
finite stack alphabet | |
transition function | |
start state | |
finite set of accept states |
The transition function
- may read an input symbol without popping a stack symbol (
-input move), and vice versa ( -stack move) -input move and -stack move maybe combined:
Acceptable moves based on stack state:
- When stack is nonempty, DPDA has exactly one legal move in every situation
- If the stack is empty, a DPDA can move only if the transition function specifies a move that pops
- Otherwise the DPDA has no legal move and it rejects without reading the rest of the input