Skip to content

7. Context Free Languages

Jan 27, 2022

  • context free languages is a more general class of languages:

    • contains all regular languages (strict subset), and others

    • e.g. L={anbn:n0} and L={wwR} are context free but not regular

  • can be described with context free grammars or pushdown automata

  • primarily interesting for parsers


A language L is context free is there is a context free grammar G with L=L(G)


Grammars

  • grammars express languages using (a list of) production rules

  • productions replace variables with other variables or terminals

  • example: grammar for L={anbn:n0}

    • SaSb | ϵ
  • example: even length palindromes L(G)={wwR:w{a,b}}

    • SaSb | SS | ϵ
  • odd length palindromes (different language)

    • SaSb | SS | a | b | ϵ
  • notation hiding intermediate derivation steps w1wn

    • means in zero or more derivation steps

    • trivially ww

Formal definition

Grammar G=(V,T,S,P) where

  • V is s set of variables

  • T is a set of terminal symbols

  • S is a start variable (SV)

  • P is a set of productions

    • all in form As where s is string of variables and terminals

Language of grammar G with start variable S

  • L(G)={w:Sw,wT}

  • w is string of terminals of ϵ

Derivation order and derivation trees

Example

Consider the grammar with 5 productions, and derivation of string aab:

SAB
AaaA | ϵ
BBb | ϵ

Leftmost derivation:

SABaaABaaBaaBbaab

Rightmost derivation:

SABABbAbaaAbaab

Sometimes derivation order does not matter, both give the same derivation tree:

(Take the leaf nodes to get aab)

1
2
3
4
5
6
7
         S
     ⭩      ⭨
   A          B
 ⭩ 🠧 ⭨      ⭩  ⭨
a  a  A     B     b
      🠧     🠧
      ε     ε

Ambiguity

  • a context free grammar G is ambiguous is there is a string wL(G) which has two different derivation trees or two leftmost derivations

  • ambiguity may cause problems in applications using derivation trees e.g. evaluating expressions; in general compilers

  • Two ways to handle ambiguity:

    • (1) write non-ambiguous grammar (difficult)

    • (2) add rules to the parser to handle priority (practical solution)

  • It is not possible to write all grammars as non-ambiguous

    • e.g. L={anbncm}{anbmcm},n,m0 is inherently ambiguous
      the string anbncnL always has two different derivation trees for any grammar

Example

EE+E | EE | (E) | a

(A) leftmost derivation of a+aa:

EE+Ea+Ea+EEa+aEa+aa

1
2
3
4
5
6
7
         E
    ⭩    🠧   ⭨
   E     +      E
   🠧         ⭩  🠧  ⭨
   a        E   *    E
            🠧        🠧 
            a        a

(B) another leftmost derivation of a+aa:

EEEE+EEa+EEa+aEa+aa

1
2
3
4
5
6
7
             E
        ⭩    🠧   ⭨
       E     *     E
   ⭩  🠧  ⭨        🠧
   E   +   E       a
   🠧       🠧 
   a       a

Bottom up evaluation for a=2: (A) gives correct result 2+2*2=6 but (B) does not 2+2*2=8.

This grammar is ambiguous because there are two different leftmost derivations.