Title: Introduction to Language Theory
1Introduction to Language Theory
Programming Language Translators
 Prepared by
 Manuel E. Bermúdez, Ph.D.
 Associate Professor
 University of Florida
2Introduction to Language Theory
 Definition An alphabet (or vocabulary) S is a
finite set of symbols.  Example Alphabet of Pascal
  / lt (operators)
 begin end if var (keywords)
 ltidentifiergt (identifiers)
 ltstringgt (strings)
 ltintegergt (integers)
 , ( ) (punctuators)
 Note All identifiers are represented by one
symbol, because S must be finite.
3Introduction to Language Theory
 Definition A sequence t t1t2tn of symbols
from an alphabet S is a string.  Definition The length of a string t t1t2tn
(denoted t) is n. If n 0, the string is e,
the empty string.  Definition Given strings s s1s2sn and
 t t1t2tm, the concatenation of s and t,
denoted st, is the string s1s2snt1t2tm.
4Introduction to Language Theory
 Note eu u ue, uev uv, for any strings u,v
(including e)  Definition S is the set of all strings of
symbols from S.  Note S is called the reflexive, transitive
closure of S.  S is described by the graph (S, ), where
denotes concatenation, and there is a designated
start node, e.
5Introduction to Language Theory
 Example S a, b.
 (S, )
 S is countably infinite, so cant compute all of
S, and can only compute finite subsets of S,
but can compute whether a given string is in S.
aa
a
a
aba
b
a
ab
a
b
abb
b
ba
a
b
b
bb
6Introduction to Language Theory
 Example S Pascal vocabulary.
 S all possible alleged Pascal programs,
i.e. all possible inputs to Pascal compiler.  Need to specify L ? S, the correct Pascal
programs.  Definition A language L over an alphabet S is a
subset of S.
7Introduction to Language Theory
 Example S a, b.
 L1 ø is a language
 L2 e is a language
 L3 a is a language
 L4 a, ba, bbab is a language
 L5 anbn / n gt 0 is a language
 where an aaa, n times
 L6 a, aa, aaa, is a language
 Note L5 is an infinite language, but described
finitely.
8Introduction to Language Theory
 THIS IS THE MAIN GOAL OF LANGUAGE SPECIFICATION
 To describe (infinite) programming languages
finitely, and to provide corresponding finite
inclusiontest algorithms.
9Language Constructors
 Definition The catenation (or product) of two
languages L1 and L2, denoted L1L2, is the set  uv u?L1, v?L2.
 Example L1 e, a, bb, L2 ac, c
 L1L2 ac, c, aac, ac, bbac, bbc
 ac, c, aac, bbac, bbc
10Language Constructors
 Definition Ln LLL (n times),
 and L0 e.
 Example L a, bb
 L3 aaa, aabb, abba,
abbbb, bbaa, bbabb, bbbba, bbbbbb
11Language Constructors
 Definition The union of two languages L1 and L2
is the set L1 L2 u u?L1 v v?L2  Definition The Kleene star (L) of a language is
the set L U Ln, n gt0.  Example L a, bb
 L any string composed of as and
 bbs
 Definition The Transitive Closure (L) of a
language L is the set L U Ln, n gt 1.
n
n
12Language Constructors
 Note
 In general, L L U e, but L ? L  e.
 For example, consider L e. Then
 e L ? L e e e ø.
13Grammars
 Goal Providing a means for describing languages
finitely.  Method Provide a subgraph (S, ?) of (S, ),
and a start node S, such that the set of
reachable nodes (from S) are the strings in the
language.
14Grammars
 Example S a, b
 L anbn / n gt 0
a
aaa
aaba
a
aa
a
aab
b
b
b
a
ab
a
aabb
a
ba
bbaa
a
b
a
b
bba
b
bb
bbab
b
bbb
b
15Grammars
 gt (derives) is a relation defined by a finite
set of rewrite rules known as productions.  Definition Given a vocabulary V, a production is
a pair (u, v) ? V x V, denoted u ? v. u is
called the leftpart v is called the rightpart.
16Grammars
 Example PseudoEnglish.
 V Sentence, NP, VP, Adj, N, V, boy, girl,
the, tall, jealous, hit, bit  Sentence ? NP VP (one production)
 NP ? N
 NP ? Adj NP
 N ? boy
 N ? girl
 Adj ? the
 Adj ? tall
 Adj ? jealous
 VP ? V NP
 V ? hit
 V ? bit
 Note English is much too complicated to be
described this way.
17Grammars
 Definition
 Given a finite set of productions P ? V x V
the relation gt is defined such that  ?, ß, u, v ? V , ?uß gt ?vß iff
 u ? v ? P is a production.
 Example
 Sentence ? NP VP Adj ? the
 NP ? N Adj ? tall
 NP ? Adj NP Adj ? jealous
 N ? boy VP ? V NP
 N ? girl V ? hit
 V ? bit
18Grammars
 Sentence gt NP VP
 gt Adj NP VP
 gt the NP VP
 gt the Adj NP VP
 gt the jealous NP VP
 gt the jealous N VP
 gt the jealous girl VP
 gt the jealous girl V NP
 gt the jealous girl hit NP
 gt the jealous girl hit Adj NP
 gt the jealous girl hit the NP
 gt the jealous girl hit the N
 gt the jealous girl hit the boy
19Grammars
 Definition A grammar is a 4tuple G (F, S, P,
S)  where
 F is a finite set of nonterminals,
 S is a finite set of terminals,
 V F U S is the grammars vocabulary,
 S ? F is called the start or goal symbol,
 and P ? V x V is a finite set of productions.
 Example Grammar for anbn / n gt 0.
 G (F, S, P, S), where
 F S,
 S a, b,
 and P S ? aSb, S ? e
20Grammars
 Derivations
 S gt aSb gt aaSbb gt aaaSbbb gt aaaaSbbbb ?

 e ab aabb aaabbb
aaaabbbb  Note Normally, grammars are given by simply
listing the productions.
gt
gt
gt
gt
gt
21Grammar Conventions
 TWS
convention  Upper case letter (identifier) nonterminal
 Lower case letter (string) terminal
 Lower case greek letter strings in V
 Left part of the first production is assumed to
be the start symbol, e.g.  S ? aSb
 S ? e
 Left part omitted if same as for preceeding
production, e.g.  S ? aSb
 ? e
22Grammars
 Example Grammar for identifiers.
 Identifier ? Letter
 ? Identifier Letter
 ? Identifier Digit
 Letter ? a ? A
 ? b ? B
 .
 .
 ? z ? Z
 Digit ? 0
 ? 1
 .
 .
 ? 9
23Grammars
 Definition The language generated by a grammar
G, is the set L(G) ? ? S S gt ?  Definition A sentential form generated by a
grammar G is any string a such that S gt ? . 
 Definition A sentence generated by a grammar G
is any sentential form ? such that ? ? S.
24Grammars
 Example
 sentential forms
 S gt aSb gt aaSbb gt aaaSbbb gt aaaaSbbbb gt

 e ab aabb aaabbb
aaaabbbb 
 Lemma L(G) ? is a sentence
 Proof Trivial.
gt
gt
gt
gt
gt
sentences
25Grammars
 Example A ? aABC
 ? aBC
 aB ? ab
 bB ? bb
 bC ? bc
 CB ? BC
 cC ? cc

26Grammars
 Derivations A gt aABC gt aaABCBC gt

 aBC aaBCBC
aaaBCBCBC 
 abC aabCBC aaaBBCBCC

 abc aabBCC
aaaBBBCCC 
 aabbCC aaabBBCCC
 (2)
 aabbcC aaabbbCCC

 aabbcc aaabbbcCC

(2) 
aaabbbccc  L (G) anbncn n gt 1
gt
gt
gt
gt
gt
gt
gt
gt
gt
gt
gt
gt
gt
gt
gt
gt
27The Chomsky Hierarchy
 A hierarchy of grammars, the languages they
generate, and the machines the accept those
languages.
28The Chomsky Hierarchy
Type Language Name Grammar Name Restrictions On grammar Accepting Machine
0 Recursively Enumerable Unrestricted rewriting system None Turing Machine
1 ContextSensitive Language Context Sensitive Grammar For all ???, ?? Linear Bounded Automaton
2 Context Free Language Context Free Grammar For all ???, ??F. PushDown Automaton (parser)
3 Regular Language Regular Grammar For all ???, ??F, ???U ?FU? Finite State Automaton
29Language Hierarchy
0 Recursively Enumerable Languages
1 ContextSensitive Languages
2 Contextfree Languages
We will deal with type 2 (syntax) and type 3
(lexicon) languages.
3 Regular Languages an n gt 0
anbn ngt0
anbncn ngt0
English?