8. CFG Normal Forms
Feb 1, 2022
Substitution rules
- modify grammar to produce an equivalent grammar that produces same strings
- Substitute
to obtain equivalent grammar:
Original
|
Equivalent
|
-productions
- when
is a nullable variable, also eliminate -productions: - after removing
-productions all nullable variables disappear (except for start variable)
Original
|
Equivalent
|
Unit productions
- unit production occurs when a single variable occurs on both sides, e.g.
- eliminate unit productions using substitution rule
Original |
Intermediate (substitute A → B) |
Intermediate (remove B → B) |
Final equivalent (remove B → A) |
Useless variables & productions
- productions that do not generate any useful string with only terminals or is unreachable
- in grammar:
is useless because once it appears it is not possible to eliminate it and derivation never terminates - in grammar:
is useless because it cannot be reached from - variable is useful when it eventually produces a string with only terminals, i.e. if there exists a derivation from
- a production
is useless if any of its variables (on left or right) is useless:
1 2 3 4 5 6 7 |
|
Simplification steps
To remove all unwanted variables and productions:
- remove nullable variables
- remove unit productions
- remove useless variables
Removing useless variables
- find all variables that can produce strings with only terminals or
(possibly useful variables)- generate a set of variable names that meet this condition
- repeat in rounds until no new variables can be found
- Remove productions that use variables that are not in the set identified in step 1
- Find all variables reachable from S
- use dependency graph where nodes are variables to identify unreachable variables
- Keep only reachable variables to produce final grammar with only useful variables
Chomsky normal form
- simplification is needed to produce a grammar in Chomsky normal form
- in CNF all productions have form:
or - any grammar that does not generate
has a corresponding CNF representation- if not in CNF, we can always obtain equivalent grammar in CNF
- CNF is useful for parsing and proving theorems and easy to find for any CFG that does not generate
Converting to CNF
General procedure for obtaining CNF:
- remove nullable variables, unit productions, (optionally) useless variables
- then from every symbol
, introduce new variable and add production- in productions with length
2, replace with - productions of form
do not need to change
- in productions with length
- split and replace any production of form
with:
introducing new intermediate variables
Example: converting to Chomsky normal form
- introduce new variables for terminals:
- introduce new intermediate variable
to break up - introduce new intermediate variable
to break up
Original ≠ CNF
|
Step 1 ≠ CNF
|
Step 2 ≠ CNF
|
Step 3 = CNF
|
Greinbach normal form
- all productions have form
- GNF is very good for parsing strings, better than Chomsky
- It is difficult to create a grammar in GNF or to convert to it
Example: converting to Greinbach normal form
Original ≠ GNF
|
Final = GNF
|