Skip to content

8. CFG Normal Forms

Feb 1, 2022

Substitution rules

  • modify grammar to produce an equivalent grammar that produces same strings
  • Substitute By to obtain equivalent grammar:
Original AxBzBy1 Equivalent AxBz | xy1z

ϵ-productions

  • when X is a nullable variable, also eliminate ϵ-productions: Xϵ
  • after removing ϵ-productions all nullable variables disappear (except for start variable)
Original SaMbMaMbMϵ Equivalent SaMb | abMaMb | ab

Unit productions

  • unit production occurs when a single variable occurs on both sides, e.g. XY
  • eliminate unit productions using substitution rule
Original
 
SaAAaABBABbb
Intermediate
(substitute A → B)
SaA | aBAaBA | BBbb
Intermediate
(remove B → B)
SaA | aBAaBABbb
Final equivalent
(remove B → A)
SaA | aBAaBbb

Useless variables & productions

  • productions that do not generate any useful string with only terminals or is unreachable
  • in grammar:   SaSb | ϵ | A     AaA
    A is useless because once it appears it is not possible to eliminate it and derivation never terminates
  • in grammar:   SA     AaA | ϵ     Bba
    B is useless because it cannot be reached from S
  • variable is useful when it eventually produces a string with only terminals, i.e. if there exists a derivation from S...wL(G)
  • a production Ax is useless if any of its variables (on left or right) is useless:
1
2
3
4
5
6
7
  VARIABLES     GRAMMAR    PRODUCTIONS
                S → aSb    
                S → ε      
                S → A  --- useless
    useless --- A → aA --- useless
    useless --- B → C ---- useless
    useless --- C → D ---- useless

Simplification steps

To remove all unwanted variables and productions:

  1. remove nullable variables
  2. remove unit productions
  3. remove useless variables

Removing useless variables

  1. find all variables that can produce strings with only terminals or ϵ (possibly useful variables)
    • generate a set of variable names that meet this condition
    • repeat in rounds until no new variables can be found
  2. Remove productions that use variables that are not in the set identified in step 1
  3. Find all variables reachable from S
    • use dependency graph where nodes are variables to identify unreachable variables
  4. Keep only reachable variables to produce final grammar with only useful variables

Chomsky normal form

  • simplification is needed to produce a grammar in Chomsky normal form
  • in CNF all productions have form:   ABC   or   Aa
  • any grammar that does not generate ϵ has a corresponding CNF representation
    • if not in CNF, we can always obtain equivalent grammar in CNF
  • CNF is useful for parsing and proving theorems and easy to find for any CFG that does not generate ϵ

Converting to CNF

General procedure for obtaining CNF:

  1. remove nullable variables, unit productions, (optionally) useless variables
  2. then from every symbol a, introduce new variable Ta and add production Taa
    • in productions with length 2, replace a with Ta
    • productions of form Aa do not need to change
  3. split and replace any production of form AC1C2Cn with:

    AC1V1
    V1C2V2

    Vn2Cn1Cn

    introducing new intermediate variables V1,V2,,Vn2

Example: converting to Chomsky normal form

  1. introduce new variables for terminals: Ta,Tb,Tc
  2. introduce new intermediate variable V1 to break up SABTa
  3. introduce new intermediate variable V2 to break up ATaTaTb
Original ≠ CNF SABaAaabBAc Step 1 ≠ CNF SABTaATaTaTbBATcTaaTbbTcc Step 2 ≠ CNF SAV1V1BTaATaTaTbBATcTaaTbbTcc Step 3 = CNF SAV1V1BTaATaV2V2TaTbBATcTaaTbbTcc

Greinbach normal form

  • all productions have form AaV1V2Vk   k0
  • GNF is very good for parsing strings, better than Chomsky
  • It is difficult to create a grammar in GNF or to convert to it

Example: converting to Greinbach normal form

Original ≠ GNF SabSbSaa Final = GNF SaTbSTbSaTaTaaTbb