What are Context Free Languages?

Grammars, Derivation, Expressiveness, Chomsky Hierarchy

Published in

Computronium Blog

10 min readJul 27, 2021

Previously, we talked about how languages are studied using the notion of a formal language. Formal language is a mathematical construction that uses sets to describe a language and understand its properties.

We introduced the notion of a string, which is a word or sequence of characters, symbols or letters. Then we formally defined the alphabet, which is a set of symbols. The alphabet often goes hand in hand with the language because we define a formal language as a set of strings over a unique alphabet.

Then we explored some operations on the string.

Then we explored some operations on languages.

Exploring operations naturally leads us to the properties of the languages and the resulting languages from these operations.

Two main topics of the article were regular expressions and finite-state automata. Which are computational paradigms that help us express or generate regular languages.

Then we explored the properties of the finite state automata itself, we said that the FSA is an information generating machine, and this process of information generation can be deterministic or non deterministic.

And the main topic of the article was to explore the operations on FSA, and whether they are closed under these operations.

Upon proving that these operations, such as concatenation, are closed we apply them to morphology. In the sense that we can define a FSA that modularly build affixes to form words.

Beyond Regular

Eventually we would like to study and understand natural languages, such as English. So far we learned about regular languages, which are neat, but we know English can’t be a regular language.

How do we know English is not a Regular Language?

If we consider English as just a set of words, which it is.

And we construct regular expressions for each word, which we technically could.

English is a regular language, since union of regular languages is a regular language.

Union of two regular languages - Wikipedia

In formal language theory, and in particular the theory of nondeterministic finite automata, it is known that the union…

en.wikipedia.org

Higher Linguistic Abstractions

But the problem arises when we realize that English is more than just set of words. For example, take the sentence “I ate an apple”, which is a really simple sentence.

If English was a set of words, then order of a group of words shouldn’t matter.

[
  [ 'i', 'ate', 'an', 'apple' ],
  [ 'ate', 'i', 'an', 'apple' ],
  [ 'an', 'i', 'ate', 'apple' ],
  [ 'i', 'an', 'ate', 'apple' ],
  [ 'ate', 'an', 'i', 'apple' ],
  [ 'an', 'ate', 'i', 'apple' ],
  [ 'apple', 'ate', 'i', 'an' ],
  [ 'ate', 'apple', 'i', 'an' ],
  [ 'i', 'apple', 'ate', 'an' ],
  [ 'apple', 'i', 'ate', 'an' ],
  [ 'ate', 'i', 'apple', 'an' ],
  [ 'i', 'ate', 'apple', 'an' ],
  [ 'i', 'an', 'apple', 'ate' ],
  [ 'an', 'i', 'apple', 'ate' ],
  [ 'apple', 'i', 'an', 'ate' ],
  [ 'i', 'apple', 'an', 'ate' ],
  [ 'an', 'apple', 'i', 'ate' ],
  [ 'apple', 'an', 'i', 'ate' ],
  [ 'apple', 'an', 'ate', 'i' ],
  [ 'an', 'apple', 'ate', 'i' ],
  [ 'ate', 'apple', 'an', 'i' ],
  [ 'apple', 'ate', 'an', 'i' ],
  [ 'an', 'ate', 'apple', 'i' ],
  [ 'ate', 'an', 'apple', 'i' ]
]

So all these permutations of the words should have the same meaning. But it doesn’t. Consider the bolded permutations.

I ate an apple.
Apple ate an I.

If we disregard the “I/me” rule, it’s kinda clear that the meaning changes. Furthermore, it's also clear many of these permutations are meaningless.

So order matters. Which makes sentences a sequence, not a set.

But what is a sentence? Sentence is a group of words where repetition is allowed and order matters.

Need for Rules

Notice that when we have a sentence of 4 words, we have 4! ways to form sentences. So out of the 24 different ways to order the words, only 1 really makes sense.

For an average sentence length of 20 words in modern English, we find that is 2432902008176640000 different permutations to check. This is not really computationally tractable for computers or brains. Hence the need for ways to speed up sentence creation.

The solution to create well formed sentences is rules or grammars. Think of these as computational shortcuts.

Grammars

Set of rules that manipulate the symbols.

Terminal

Elements of the target language

Elements of some finite set:

Words
Letters

Non-terminal

Auxiliary symbols that facilitate the specification

Syntactic categories:

Sentence
Noun phrase
Verb phrase

Rules

A rule is a non-empty sequence of symbols, a mixture of terminals and non-terminals, with the only requirement that the first element in the sequence be a non-terminal one (alternatively, one can define a rule as an ordered pair whose first element is a non-terminal symbol and whose second element is a sequence of symbols)

Given this we are really creating recipes for creating sentences and phrases.

We write such rules with a special symbol, ‘→,’ separating the distinguished leftmost non-terminal from the rest of the sequence. The leftmost non-terminal is sometimes referred to as the head of the rule, while the rest of the symbols are called the body of the rule.

Non-terminal → Terminal
Head of the rule → Body of the rule

For natural languages it is the case that the body is either all terminal or non terminal, meaning they don’t mix ever. But there is nothing in the formalism that requires such a rule for grammars.

Given the terminals {the, cat, in, the, hat} and non terminals {D, N, P, NP, PP}, where:

D is determiner
N is noun
P is preposition
NP is noun phrase
PP is preposition phrase

Class I (all terminal) examples of rules:

D → the
N → cat
N → hat
P → in

Class II (all non-terminal) examples of rules:

NP → D N
PP → P NP
NP → NP PP

Context Free Grammars

DEFINITION 1. Context Free Grammar, CFG is a four tuple G = ⟨ V, Σ, P, S⟩ , where:

V is a finite set of non-terminal symbols
Σ is an alphabet of terminal symbols
P ⊂ V × (V ∪ Σ)* is a set of rules
S ∈ V is the start symbol

Grammar G = V,Σ, P, S, where V = {D, N, P, NP, PP}, Σ = {the, cat, in, hat }, P is the set of rules, and the start symbol S is NP.

Generally we introduce grammars by listing their rules only, which is denoted by P.

Derivations

DEFINITION 2. Let G = ⟨ V, Σ, P, S ⟩ be a grammar. The set of forms induced by G is (V ∪ Σ)*:

A form α immediately derives a form β, denoted by α → β, if and only if there exist γₗ, γᵣ ∈ (V ∪ Σ)* such that α = γₗAγᵣ and β=γₗγγᵣ and A → γ is a rule in P
A is called the selected symbol

Medium doesn’t do latex so I screenshotted my notes

I would also love the ability to do nested bullet points.

So what do forms and derivations have anything to do with context free languages?

We formally define these concepts so that we can define the language generated by a context free grammar.

This basically says that the language generated by a grammar is denoted by its start symbol.

EXAMPLE 1. Set of possible rules over these two sets:

V = {D, N, P, NP, PP},
Σ = {the, cat, in, hat },

is:

EXAMPLE 2. Let grammar be G = ⟨ V, Σ, P, S ⟩ where:

V = {D, N, P, NP, PP},
Σ = {the, cat, in, hat },
P is the set of rules, and
the start symbol S is NP.

Derivation Trees

These are sometimes called parse trees, they help us visualize what the derivations are doing. You can think of them as imposing a structure or form on words.

Some tree terminology:

Vertex is each node
Root is the top most node, which is the start symbol S
If a vertex has an outgoing branch, then it must be a non-terminal symbol
This implies that this parent node is the head of the rule and the child node or children are the bodies of the rule
Leaf is a node with no outgoing branches
Derivation trees should have a natural or left to right order,
If you read the leaves like this, we call this the yield of the tree

Ambiguity

We can express ambiguity using derivation trees. Consider these sentences:

(the cat in the hat) in the hat
the cat in (the hat in the hat)

These have different meanings even though without the brackets they’re exactly the same sentence.

We call this structural or syntactic ambiguity.

Expressiveness

We will show here that we can express all regular expressions using grammars, and show that grammars can generate even more abstractions, implying that the class of regular languages is contained within the class of context free languages.

We say that context free languages are more expressive than regular languages.

Right Linear Grammar

In this type of grammar, the body of the rule consists of a sequence of terminal symbols:

i.e. Σ = {the, cat, in, hat }

optionally followed by one non-terminal. Restricting that there should never be more than one non-terminal in a body.

If there is then that is a leaf node.

This sub-type of grammar generates all and only regular languages.

Chomsky Hierarchy

We learned so far about context free grammars and right linear grammars and the languages they generate.

If we relax our definition of context free grammars, we can introduce context sensitive grammars, these have even more expressiveness than CFGs.

Context Sensitive Languages

In the head of the body, we not only have the head, we would also have the context. The formalism that corresponds to this class of languages is called the Context Sensitive Grammar,

Note that context free languages are a special case of context sensitive languages, where the context is empty.

Recursively Enumerable Languages

Now above the context sensitive grammars is the general phrase-structure grammars also called unrestricted grammars, which generate the recursively enumerable languages.

Natural Languages

Natural languages are mostly context free and only some specific constructions in natural languages necessitate us to use context sensitive grammars. This motivated us to develop the mildly context sensitive category.

Weak Generative Capacity

Weak generative capacities are a type of property of computational formalisms. For example when we say “Weak generative capacity of regular expressions is insufficient for generating English, we are saying “regular expressions can’t generate all the words or sentences in English”.

Strong Generative Capacity

So far we have considered only the generation of words or sentences by grammars. But the structure is as important if not more. Strong generative capacity helps us weakly generate all the strings in the language AND assign correct structures to them.

Mildly Context Sensitive Languages

These are almost an extension of the Context Free Languages, this class of languages have the properties:

properly contains all the context free languages
can be parsed in polynomial time
can properly account for the construction in natural languages that CFGs can’t
cross serial dependencies
has the linear growth property

Tree Adjoining Grammars

TAG constructs larger trees from smaller ones, so that basic operations that take place during derivations are not limited to string concatenation.

These operations facilitate nesting of one tree within another, resulting in extended expressiveness.

Natural Language Grammars

Several other grammars are proposed as adequate for expressing natural languages:

Head Grammars
Linear Indexed Grammars
Combinatory Categorical Grammars

All three are weakly equivalent to TAG, which could imply that TAG may just be the correct formalism class in which all natural languages reside.

Trade Offs

We learned so far that a language is just a set of words, however we can still classify them into contained layers.

Regular languages
Context free languages
Mildly context sensitive languages
Context sensitive languages
Recursively enumerable languages

Where each language has a corresponding computational formalism called a grammar. And each outer grammar gives us more expressiveness.

However this expressiveness does come at a cost, a computational cost. Because each grammar comes with a computational model as well.

Regular languages → Finite state automata
Context free languages → Push down automata
Mildly context sensitive languages
Context sensitive languages → Linear bounded turing machines
Recursively enumerable languages → Turing machines

And linguistics tasks such as:

recognizing sentences, phrases, bigrams, hapaxes, words, morphemes
determining their structure at each level
extracting their meaning
manipulating the information they contain

are all computational tasks that come with algorithms, algorithms which have space and time constraints and demands.

For example a string’s membership in a language generated by a tree adjoining grammar can only be checked in time proportional to |w|⁶.

Checking membership of w ∈ L(G) where G is context free requires |w|³ time as well as the computing model of push down automata which is a modified FSA with a unbounded memory in the form of a stack.

The complexity of these models and the algorithms is what we will learn about next.

Up Next…

In the next article, we will explore Computational Complexity of Natural Languages.

For the table of contents and more content click here.

References

Clark, Alexander. The Handbook of Computational Linguistics and Natural Language Processing. Wiley-Blackwell, 2013.

Eisenstein, Jacob. Introduction to Natural Language Processing. The MIT Press, 2019.

Bird, Steven, et al. Natural Language Processing with Python. O’Reilly, 2009.

Jurafsky, Dan, and James H. Martin. Speech and Language Processing. Pearson, 2014.

Barker-Plummer, Dave, et al. Language, Proof and Logic. CSLI Publ., Center for the Study of Language and Information, 2011.