Skip to content

11. Pumping Lemma for CFGs

Feb 10, 2022

Pumping lemma for context-free languages, also Bar-Hillem Lemma.

  • Take some infinite CFL that generates infinite number of strings
  • every infinite CFL can be pumped => if this property does not hold then language in not CFL

Choose a long enough string where variable(s) is repeated - example:

Grammar G

SABE|bBdAAa|aBbSD|ccDDd|dEeE|e

SaaBee     BbbBdd     Bcc

Saa(bb)0cc(dd)0eeL(G)     removed B, the repeating part

Saa(bb)2cc(dd)2eeL(G)     repeat B twice

Saa(bb)icc(dd)ieeL(G)     repeat B i times, for any i0

=> this is infinite language

For arbitrary grammars

Consider arbitrary infinite context-free language L. Let G be the grammar of L{ϵ}.

Take G so that is has no unit-productions and no ϵ-productions (see: normal forms).

Let r be the number of variables, and let t be the maximum right-hand size of any production, e.g. for G,   r=5 and t=3.

Claim: take string wL(G) with |w|>tr. Then in the derivation tree of w there is a path from the root to the leaf where a variable of G is repeated.

Proof (by contradiction): show some variable is repeated.

First show that three of w has at least r+2 levels of nodes:

  • maximum number of nodes per level:

    • 0: 1,
    • 1: t (the max right hand side of any production)
    • 2: t2
    • => in general at level r: tr nodes
  • therefore the max length of string w:|w|tr

  • however we took string |w|>tr => contradiction, therefore tree must have at least r+2 levels.

Thus, there is a path from the root to a leaf with at least r+2 nodes. By pigeonhole principle, since there are at most r different variables, some variable is repeated.

Take now a string w with |w|>tr, from claim: some variable H is repeated. Take H to be the deepest so that only H is repeated in subtree. We can write w=uvxyz, where u,v,x,y,z are strings of terminals.

Example. For grammar G, u=aa,v=bb,x=cc,y=dd,z=ee and B is H.

Possible derivations: SuHz     HvHy     Hx

SaaBee     HbbBdd     Bcc

Remove middle part:   SuHz and Hx   yields uxz=uv0xy0zL(G)

Repeat middle part:   yields uvvxyyx=uv2xy2zL(G)

Therefore, if we know for w, where |w|>tr, that w=uvxyzL(G) then we also know uvixizL(G) for all i0.

Observations:

  • |vy|1 since G has no unit and ϵ-productions → at least one of v or y is not ϵ
  • |vxy|tr+1 since in subtree only variable H is repeated
  • thus if we choose critical length p=tr+1>tr we obtain pumping lemma for context-free languages

Pumping lemma for CFGs

For any infinite context-free language L, there exists and integer p such that for any string wL, |w|p, we can write w=uvxyz with lengths |vxy|p and |vy|1 and it must be that uvixyizL for all i0.

Applications

L={anbncn:n0} use pumping lemma to prove L is not CFL (by contradiction)

Assume L is CFL, choose n=p, then examine by cases all possible locations of vxy:

  • case 1: vxy in ap
  • case 2: vxy in bp
  • case 3: vxy in cp
  • case 4: vxy overlaps ap and bp
    • subcase 1: y contains only a, y contains only b
    • subcase 2: v contains a and b, and y contains only b
    • subcase 3: v contains only a, y contains a and b
  • case 5: vxy overlaps bp and cp (similar to case 4)
  • case 6: vxy overlaps ap and bp and cp - impossible by |vxy|p

=> in all cases we obtain contradiction, therefore L is not context-free.