Skip to content

7. Context Free Languages

Jan 27, 2022

  • context free languages is a more general class of languages:

    • contains all regular languages (strict subset), and others

    • e.g. \(L = \{a^nb^n:n \geq 0\}\) and \(L = \{ww^R\}\) are context free but not regular

  • can be described with context free grammars or pushdown automata

  • primarily interesting for parsers


A language \(L\) is context free is there is a context free grammar \(G\) with \(L = L(G)\)


Grammars

  • grammars express languages using (a list of) production rules

  • productions replace variables with other variables or terminals

  • example: grammar for \(L = \{a^nb^n:n \geq 0\}\)

    • \(S \rightarrow aSb\) | \(\epsilon\)
  • example: even length palindromes \(L(G) = \{ww^R: w \in \{a,b\}^*\}\)

    • \(S \rightarrow aSb\) | \(SS\) | \(\epsilon\)
  • odd length palindromes (different language)

    • \(S \rightarrow aSb\) | \(SS\) | \(a\) | \(b\) | \(\epsilon\)
  • notation hiding intermediate derivation steps \(w_1 \xrightarrow{\mathit{*}} w_n\)

    • \(\xrightarrow{\mathit{*}}\) means in zero or more derivation steps

    • trivially \(w \xrightarrow{\mathit{*}} w\)

Formal definition

Grammar \(G = (V, T, S, P)\) where

  • \(V\) is s set of variables

  • \(T\) is a set of terminal symbols

  • \(S\) is a start variable (\(S \in V\))

  • \(P\) is a set of productions

    • all in form \(A \rightarrow s\) where \(s\) is string of variables and terminals

Language of grammar \(G\) with start variable \(S\)

  • \(L(G) = \{w: S\xrightarrow{\mathit{*}}w, w \in T^*\}\)

  • \(w\) is string of terminals of \(\epsilon\)

Derivation order and derivation trees

Example

Consider the grammar with 5 productions, and derivation of string \(aab\):

\(S \rightarrow AB\)
\(A \rightarrow aaA\) | \(\epsilon\)
\(B \rightarrow Bb\) | \(\epsilon\)

Leftmost derivation:

\(S \rightarrow AB \rightarrow aaAB \rightarrow aaB \rightarrow aaBb \rightarrow aab\)

Rightmost derivation:

\(S \rightarrow AB \rightarrow ABb \rightarrow Ab \rightarrow aaAb \rightarrow aab\)

Sometimes derivation order does not matter, both give the same derivation tree:

(Take the leaf nodes to get \(aab\))

1
2
3
4
5
6
7
         S
     ⭩      ⭨
   A          B
 ⭩ 🠧 ⭨      ⭩  ⭨
a  a  A     B     b
      🠧     🠧
      ε     ε

Ambiguity

  • a context free grammar \(G\) is ambiguous is there is a string \(w \in L(G)\) which has two different derivation trees or two leftmost derivations

  • ambiguity may cause problems in applications using derivation trees e.g. evaluating expressions; in general compilers

  • Two ways to handle ambiguity:

    • (1) write non-ambiguous grammar (difficult)

    • (2) add rules to the parser to handle priority (practical solution)

  • It is not possible to write all grammars as non-ambiguous

    • e.g. \(L = \{a^nb^nc^m\} \cup \{a^nb^mc^m\}, n, m \geq 0\) is inherently ambiguous
      the string \(a^nb^nc^n \in L\) always has two different derivation trees for any grammar

Example

\(E \rightarrow E + E\) | \(E * E\) | \((E)\) | \(a\)

(A) leftmost derivation of \(a + a * a\):

\(E \rightarrow E + E \rightarrow a + E \rightarrow a + E * E \rightarrow a + a * E \rightarrow a + a * a\)

1
2
3
4
5
6
7
         E
    ⭩    🠧   ⭨
   E     +      E
   🠧         ⭩  🠧  ⭨
   a        E   *    E
            🠧        🠧 
            a        a

(B) another leftmost derivation of \(a + a * a\):

\(E \rightarrow E * E \rightarrow E + E * E \rightarrow a + E * E \rightarrow a + a * E \rightarrow a + a * a\)

1
2
3
4
5
6
7
             E
        ⭩    🠧   ⭨
       E     *     E
   ⭩  🠧  ⭨        🠧
   E   +   E       a
   🠧       🠧 
   a       a

Bottom up evaluation for a=2: (A) gives correct result 2+2*2=6 but (B) does not 2+2*2=8.

This grammar is ambiguous because there are two different leftmost derivations.