8. CFG Normal Forms
Feb 1, 2022
Substitution rules
- modify grammar to produce an equivalent grammar that produces same strings
- Substitute \(B \rightarrow y\) to obtain equivalent grammar:
Original \begin{align*} A & \rightarrow xBz \\ B & \rightarrow y_1 \end{align*} | Equivalent \begin{align*} A & \rightarrow xBz \text{ | } xy_1z \end{align*} |
\(\epsilon\)-productions
- when \(X\) is a nullable variable, also eliminate \(\epsilon\)-productions: \(X \rightarrow \epsilon\)
- after removing \(\epsilon\)-productions all nullable variables disappear (except for start variable)
Original \begin{align*} S & \rightarrow aMb \\ M & \rightarrow aMb \\ M & \rightarrow \epsilon \end{align*} | Equivalent \begin{align*} S & \rightarrow aMb \text{ | } ab \\ M & \rightarrow aMb \text{ | } ab \\ \end{align*} |
Unit productions
- unit production occurs when a single variable occurs on both sides, e.g. \(X \rightarrow Y\)
- eliminate unit productions using substitution rule
Original \begin{align*} S & \rightarrow aA \\ A & \rightarrow a \\ A & \rightarrow B \\ B & \rightarrow A \\ B & \rightarrow bb \\ \end{align*} |
Intermediate (substitute A → B) \begin{align*} S & \rightarrow aA \text{ | } aB \\ A & \rightarrow a \\ B & \rightarrow A \text{ | } B \\ B & \rightarrow bb \end{align*} |
Intermediate (remove B → B) \begin{align*} S & \rightarrow aA \text{ | } aB \\ A & \rightarrow a \\ B & \rightarrow A \\ B & \rightarrow bb \end{align*} |
Final equivalent (remove B → A) \begin{align*} S & \rightarrow aA \text{ | } aB \\ A & \rightarrow a \\ B & \rightarrow bb \end{align*} |
Useless variables & productions
- productions that do not generate any useful string with only terminals or is unreachable
- in grammar: \(S \rightarrow aSb\text{ | }\epsilon\text{ | }A\) \(A \rightarrow aA\)
\(A\) is useless because once it appears it is not possible to eliminate it and derivation never terminates - in grammar: \(S \rightarrow A\) \(A \rightarrow aA \text{ | } \epsilon\) \(B \rightarrow ba\)
\(B\) is useless because it cannot be reached from \(S\) - variable is useful when it eventually produces a string with only terminals, i.e. if there exists a derivation from \(S \rightarrow ... \rightarrow w \in L(G)\)
- a production \(A \rightarrow x\) is useless if any of its variables (on left or right) is useless:
1 2 3 4 5 6 7 |
|
Simplification steps
To remove all unwanted variables and productions:
- remove nullable variables
- remove unit productions
- remove useless variables
Removing useless variables
- find all variables that can produce strings with only terminals or \(\epsilon\) (possibly useful variables)
- generate a set of variable names that meet this condition
- repeat in rounds until no new variables can be found
- Remove productions that use variables that are not in the set identified in step 1
- Find all variables reachable from S
- use dependency graph where nodes are variables to identify unreachable variables
- Keep only reachable variables to produce final grammar with only useful variables
Chomsky normal form
- simplification is needed to produce a grammar in Chomsky normal form
- in CNF all productions have form: \(A \rightarrow BC\) or \(A \rightarrow a\)
- any grammar that does not generate \(\epsilon\) has a corresponding CNF representation
- if not in CNF, we can always obtain equivalent grammar in CNF
- CNF is useful for parsing and proving theorems and easy to find for any CFG that does not generate \(\epsilon\)
Converting to CNF
General procedure for obtaining CNF:
- remove nullable variables, unit productions, (optionally) useless variables
- then from every symbol \(a\), introduce new variable \(T_a\) and add production \(T_a \rightarrow a\)
- in productions with length \(\geq\) 2, replace \(a\) with \(T_a\)
- productions of form \(A \rightarrow a\) do not need to change
- split and replace any production of form \(A \rightarrow C_1C_2\dots C_n\) with:
\(A \rightarrow C_1V_1\)
\(V_1 \rightarrow C_2V_2\)
\(\vdots\)
\(V_{n-2} \rightarrow C_{n-1}C_n\)
introducing new intermediate variables \(V_1, V_2, \dots, V_{n-2}\)
Example: converting to Chomsky normal form
- introduce new variables for terminals: \(T_a, T_b, T_c\)
- introduce new intermediate variable \(V_1\) to break up \(S \rightarrow ABT_a\)
- introduce new intermediate variable \(V_2\) to break up \(A \rightarrow T_aT_aT_b\)
Original ≠ CNF \begin{align*} S & \rightarrow ABa \\ A & \rightarrow aab \\ B & \rightarrow Ac \end{align*} | Step 1 ≠ CNF \begin{align*} S & \rightarrow ABT_a \\ A & \rightarrow T_aT_aT_b \\ B & \rightarrow AT_c \\ T_a & \rightarrow a \\ T_b & \rightarrow b \\ T_c & \rightarrow c \\ \end{align*} | Step 2 ≠ CNF \begin{align*} S & \rightarrow AV_1 \\ V_1 & \rightarrow BT_a \\ A & \rightarrow T_aT_aT_b \\ B & \rightarrow AT_c \\ T_a & \rightarrow a \\ T_b & \rightarrow b \\ T_c & \rightarrow c \\ \end{align*} | Step 3 = CNF \begin{align*} S & \rightarrow AV_1 \\ V_1 & \rightarrow BT_a \\ A & \rightarrow T_aV_2 \\ V_2 & \rightarrow T_aT_b \\ B & \rightarrow AT_c \\ T_a & \rightarrow a \\ T_b & \rightarrow b \\ T_c & \rightarrow c \\ \end{align*} |
Greinbach normal form
- all productions have form \(A \rightarrow aV_1V_2 \dots V_k\) \(k \geq 0\)
- GNF is very good for parsing strings, better than Chomsky
- It is difficult to create a grammar in GNF or to convert to it
Example: converting to Greinbach normal form
Original ≠ GNF \begin{align*} S & \rightarrow abSb \\ S & \rightarrow aa \end{align*} | Final = GNF \begin{align*} S & \rightarrow aT_bST_b \\ S & \rightarrow aT_a \\ T_a & \rightarrow a \\ T_b & \rightarrow b \end{align*} |