Programming Languages Midterm
Rule for IfStatement is ambiguous:
"The else ambiguity is resolved by connecting an else with the last encountered else-less if."
Second generation languages
- Assembly languages. -Symbolic opeartion codes replaced binary operation codes. -Assembly language programs needed to be 'assembled' for execution by the computer. -Each assembly language instruction is translated into one machine language instruction. -Very efficient code and easier to write.
Fifth generation languages
- Declarative lannguages . -Functional(?) : List, SML. - Also called applicative : everything is a function. Example : Prolog. - Based on mathematical logic. - Rule- or Constraint-based.
What are the types of Programming languages ?
- First generation languages. Machine language : Operation code - Such as addition or substraction. Operands - that identify the data to be processed. Machine language is machine dependent as it is the only language the computer can understand. It is very efficient code but ( Very Difficult to write )
principles of PL ( 4 properties)
- Syntax -Naming -Types -Semantics
Why are there so many programming languages ?
- same theoretical power - different practical power -Different programming language are designed for different types of programs.
Fourth generation languages
-A high level generation that requires fewer instructions to accomplish a task than a third generation language. -Used with databases : - Query langauges. - Report generators. - Forms designers. - Application generators.
A programming language is :
-A notational system for describing computation in a machine-readable and human-readable form. -A tool for developping executable models for a class of problem domains.
What is a Programming Language ?
-A programming language is a set of rules that provides a way of telling a computer what operations to perform. -A programming language is a set of rules for communicating an algorithm.
Third generation languages
-Closer to English but included simple mathematical notation. -Programs written in source code which must be translated into machine language programs called object code. -The translation of source code to object code is accomplished by machine language system program called Compiler. -Alternative to compilation is interpretation which is accomplished by a system program called an Interpreter. examples : Visual Basic, Fortran , C and C++, Java.
Programming Language has a set of syntax rules
-English is a natural language, it has word, symbols and grammatical rules. -A programming language also has words, symbols and rules of grammar. -The grammatical rules are called syntax. -Each programming language has a different set of syntax rules.
Why Study PLs?
-Help Choose a language - make it easier to learn new languages - help make better use of whatever language you use (implementation cost, obscure feactures)
Levels of programming languages
1-High level program. 2-Low level program. 3-Executable Machine code.
configuration
A configuration on an fsa consists of a state and the remaining input.
Deterministic FSA
A finite state automaton is deterministic if for each state and each input symbol, there is at most one outgoing arc from the state labeled with the input symbol.
Associativity and Precedence
A grammar can be used to define associativity and precedence among the operators in an expression. E.g., + and - are left-associative operators in mathematics; * and / have higher precedence than + and -
Clarity about Binding
A language element is bound to a property at the time that property is defined for it. So a binding is the association between an object and a property of that object ex: a varible and its type
move
A move consists of traversing the arc exiting the state that corresponds to the leftmost input symbol, thereby consuming it. If no such arc, then: - If no input and state is final, then accept. - Otherwise, error. An input is accepted if, starting with the start state, the automaton consumes all the input and halts in a final state.
Give language of grammer: S→ aAb | aBb | aSb A → aA | a B → bB | b
A → aA | a as we have seen generates any number of a's ---> a⁺ B → bB | b as we have seen generates any number of b's ---> b⁺ so either generates more a than b ( a> b) or more b than a ( b> a) but not equal so language: aⁿ b^m, where n≠ m
Give language of grammer: S→ aSb | aAb A → aA | ε
A → aA | ε as we have seen generates any number of a's --> a⁺ generates more a's than b's language: aⁿ b^m, where n> m
Language Support
Accessible (public domain) compilers/interpreters • Good texts and tutorials • Wide community of users • Integrated with development environments (IDEs)
Context-free Grammars
BNF a stylized form of CFG Equivalent to a pushdown automaton For a wide class of unambiguous CFGs, there are table-driven, linear time parsers
Extended BNF (EBNF)
BNF: - recursion for iteration - nonterminals for grouping EBNF: additional metacharacters - { } for a series of zero or more - ( ) for a list, must pick one - [ ] for an optional list; pick none or one
Parser
Based on BNF/EBNF grammar • Input: tokens • Output: abstract syntax tree (parse tree) • Abstract syntax: parse tree with punctuation, many nonterminals discarded
Concrete Syntax
Based on a parse of its Tokens ; is a statement terminator
derivations
Consider the grammar: Integer → Digit | Integer Digit Digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 We can derive any unsigned integer, like 352, from this grammar.
Abstraction in Programming!
Data - Programmer-defined types/classes - Class libraries Procedural - Programmer-defined functions - Standard function libraries
regular grammer example:
Example Integer → 0 Integer | 1 Integer | ... | 9 Integer | 0 | 1 | ... | 9
Binary Digits
Example of BNF consider grammer: binaryDigit → 0 | 1 the | is a metacharacter that separates alternatives.
EBNF Examples
Expression is a list of one or more Terms separated by operators + and - Expression -> Term { ( + | - ) Term } IfStatement -> if ( Expression ) Statement [ else Statement ] C-style EBNF lists alternatives vertically and uses opt to signify optional parts. E.g., IfStatement: if ( Expression ) Statement ElsePartopt ElsePart: else Statement
Imperative paradigm
Follows the classic von Neumann-Eckert model: - Program and data are indistinguishable in memory - Program = a sequence of commands - State = values of all variables when program runs - Large programs use procedural abstraction
Example Tokens
Identifiers Literals: 123, 5.67, 'x', true Keywords: bool char ... Operators: + - * / ... Punctuation: ; , ( ) { }
Lexical Syntax
Input: a stream of characters from the ASCII set, keyed by a programmer. Output: a stream of tokens or basic symbols, classified as follows: - Identifiers e.g., Stack, x, i, push - Literals e.g., 123, 'x', 3.25, true - Keywords bool char else false float if int main true while - Operators = || && == != < <= > >= + - * / ! - Punctuation ; , { } ( )
Lexer
Input: characters Output: tokens • Separate: - Speed: 75% of time for non-optimizing compilers - Simpler design - Character sets - End of line conventions
Generators
Input: usually regular expression Output: table (slow), code C/C++: Lex, Flex Java: JLex
Parse tree for 352 as an Integer
Integer / \ int digit / \ | Int Dig 2 | | dig 5 | 3
Work out how to derive 352 from the rightmost deviation grammer Integer → Digit | Integer Digit Digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer ⇒ Integer Digit ⇒ Integer 2 ⇒ Integer Digit 2 ⇒ Integer 5 2 ⇒ Digit 5 2 ⇒ 3 5 2
Work out how to derive 352 from the leftmost deviation grammer Integer → Digit | Integer Digit Digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer ⇒ Integer Digit ⇒ Integer Digit Digit ⇒ Digit Digit Digit ⇒ 3 Digit Digit ⇒ 3 5 Digit ⇒ 3 5 2
two types of parsers
LL & LR
LL & LR Grammars
LL < LR: Some LR grammars cannot be parsed by LL parsers.
LL Parser
LL: left-to-right, leftmost-derivation left-to-right: consume the input from the left to right.
LR Parser
LR: left-to-right, rightmost-derivation left-to-right: consume the input from the left to right.
Levels of Syntax
Lexical syntax = all the basic symbols of the language (names, values, operators, etc.) Concrete syntax = rules for writing expressions, statements and programs. Abstract syntax = internal representation of the program, favoring content over form.
what is sementics ?
Meaning of a language
what type of grammer is this ? A = a A b | ε
Not a regular language { aⁿ bⁿ | n ≥ 1 } i.e., cannot balance: ( ), { }, begin end A = a B b B = a B b | ε
Syntactic Analysis
Phase also known as: parser Purpose is to recognize source structure Input: tokens Output: parse tree or abstract syntax tree A recursive descent parser is one in which each nonterminal in the grammar is converted to a function which recognizes input derivable from the nonterminal.
Context-Sensitive Grammars
Production: α → β |α| ≤ |β| left length side less than right α, β ∈ (N ∪ T)* i.e., lefthand side can be composed of strings of terminals and nonterminals
program structure of syntactic analysis
Program Structure consists of: Expressions: x + 2 * y Assignment Statement: z = x + 2 * y Loop Statements: while (i < n) a[i++] = 0; Function definitions Declarations: int i;
Lexical Analysis
Purpose: transform program representation Input: printable ASCII characters Output: tokens Discard: whitespace, comments Defn: A token is a logically cohesive sequence of characters representing a single symbol
Chomsky Hierarchy
Regular grammar -- least powerful Context-free grammar (BNF) Context-sensitive grammar Unrestricted grammar
Abstract Syntax
Removes "syntactic sugar" and keeps essential elements of a language. E.g., consider the following two equivalent loops: Pascal while i < n do begin i := i + 1; end; C/C++ while (i < n) { i = i + 1; } Theonly essential information in each of these is 1) that it is a loop, 2) that its terminating condition is i < n, and 3) that its body increments the current value of i.
Interpreter
Replaces last 2 phases of a compiler Input: - Mixed: intermediate code - Pure: stream of ASCII characters
Give grammer of language: aⁿ b^m n ≥ 1, m ≥ 0
S → AB A → aA | a B → aB | ε
Identifier
Sequence of letters and digits, starting with a letter if is both an identifier and a keyword Most languages require identifiers to be distinct from keywords In some languages, identifiers are merely predefined (and thus can be redefined by the programmer) **Redefining Identifiers can be dangerous
BNF Grammar
Set of productions: P terminal symbols: T nonterminal symbols: N start symbol: S ∈ N A production has the form A --> ω where A ∈ N and ω ∈ (N ∪T)
Finite State Automata
Set of states: representation - graph nodes Input alphabet + unique end symbol State transition function Labelled (using alphabet) arcs in graph Unique start state One or more final states
Why a Separate Phase of lexical?
Simpler, faster machine model than parser 75% of time spent in lexer for non-optimizing compiler Differences in character sets End of line convention differs
Regular Grammar
Simplest; least powerful Equivalent to: - Regular expression - Finite-state automaton right regular grammar: equivalent A → ω B A → ω Used in construction of tokenizers Less powerful than context-free grammars
Criteria in a good language design
Simplicity and Readabillity. Clarity about binding Reliability. Support Abstraction Orthogonality. Efficient implementation
What is syntax?
Structure and form (mechanics)
Backus-Naur Form (BNF)
Stylized version of a context-free grammar (cf. Chomsky hierarchy) • Sometimes called Backus Normal Form • First used to define syntax of Algol 60 • Now used to define syntax of most major languages
Reliability
The quality of a language that assures a program will not behave in unexpected or disastrous ways during execution reliable if: - Program behavior is the same on different platforms • E.g., early versions of Fortran - Type errors are detected • E.g., C vs Haskell - Semantic errors are properly trapped • E.g., C vs C++ - Memory leaks are prevented
Orthogonality
The quality of a language that features provided have as few restrictions as possible and be combinable in any meaningful way orthogonal if its features are built upon a small, mutually independent set of primitive operations. • Fewer exceptional rules = conceptual simplicity - E.g., restricting types of arguments to a function • Tradeoffs with efficiency
Simplicity and Readabillity.
The quality that enables a programmer to understand and comprehend the nature of a computation easily and accurately - ease of learning =ease of programming
Efficent Implementation
The quality that provides a translator or interpreter can be written, this can address to complexity to the language definition Embedded systems - Real-time responsiveness (e.g., navigation) - Failures of early Ada implementations • Web applications - Responsiveness to users (e.g., Google search) • Corporate database applications - Efficient search and updating • AI applications - Modeling human behaviors
Finding a More Efficient Tree
The shape of the parse tree reveals the meaning of the program. So we want a tree that removes its inefficiency and keeps its shape. - Remove separator/punctuation terminal symbols - Remove all trivial root nonterminals - Replace remaining nonterminals with leaf terminals
Object-oriented (OO) Paradigm
a collection of objects that interact by passing messages that transform the state. -Sending Messages - Inheritance - Polymorphism ex: Smalltalk, Java, C++, C#, and Ruby
what is types?
a collection of values and a collection of opeations on those values - simple types: numbers, characters, booleans - structural types: strings, lists, hash tables, trees
Metalanguage
a language used to define other languages
Grammer
a metalanguage used to define the syntax of a language
Give language of grammer: S→ aSb | ε
always generates an b for every a generated {ε, ab ,aabb, aaabbb} Language: aⁿbⁿ n≥ 0 ( zero needed bc needs to generate ε )
Ambiguous grammers
ambiguous if one of its strings has two or more diffferent parse trees.
EBNF to BNF
can always rewrite an EBNF grammar as a BNF grammar. E.g., A -> x { y } z can be rewritten: A -> x A' z A' -> | y A' (Rewriting EBNF rules with ( ), [ ] is left as an exercise.)
Give language of grammer: S→ aS | bS | ε
can generate any combination of a and b with minimum string ε so language: ( a + b)* use * if ε (empty string) is included in grammer!! + mean concatination
Give grammer of language: aⁿ b^m where ( n +m) is even
case 1: (even + even) = even → (aa)* + (bb)* case 2: (odd + odd) = even → a(aa)* + b(bb)* using previous knowledge A → aaA | ε give even number of a's so S → AB | aAbB A → aaA | ε B → bbB| ε now type if ( n+m) is odd
logical paradigm
declares what outcome the program should accomplish, rather than how it should be accomplished. - Programs as sets of constraints on a problem - Programs that achieve all possible solutions - Programs that are nondeterministic
Interpreter
executes instructions on a virtual machine
Give grammer of language: starting and ending with different symbol a ( a+b)* b | b ( a+b)* a | a | b
for any combination of a and b --> ( a + b) * : A → aA | bA | ε so .. S→ aAb | bAa | a | b A → aA | bA | ε
Give grammer of language: (aa) *
generates even number of a's and includes ε A → aaA | ε Note: (a)* any number of a's (aa)* even # of a's a(aa) * odd # of a's
Give grammer of language: aⁿ bⁿ , n ≥ 0
generates same number of a' s and b's S → aSb | ε **when an a is generated it also generated a b so equal #s of a and b are produced
Give grammer of language: aⁿ n ≥ 1
generating any number of a's A → aA | a
Give grammer of language: bⁿ n ≥ 0
generating empty string and any nymber of b's B → aB | ε
Relationships shown by the structure of the parse tree
highest precedence at the bottom, and left-associativity on the left at each level.
Parse Trees
is a graphical representation of a derivation. Each internal node of the tree corresponds to a step in the derivation. The children of a node represents a right-hand side of a production. Each leaf node represents a symbol of the derived string, reading from left to right.
Whitespace
is any space, tab, end-of-line character (or characters), or character sequence inside a comment No token may contain embedded whitespace (unless it is a character or string literal) Example: >= one token > = two tokens
Give language of grammer: S→ aA | a
mimum string a generates any number of a language { a⁺ } ⁺ b/c doesn't generate ε S→ bS| ε generates any number of b and ε so use * {b* }
Give language of grammer: S→ aS | bS | a | b
minimum string a or b ( not ε) {a,b,bb,ab,aa,abb,aab} language: (a +b)⁺ use ⁺ b/c no ε
Give grammer of language: aⁿ b^m , n ≥ 3 , m ≥2
minimum string is aaabb so have to be able to generate the minimum S→ aaaAbbB A → aA | ε B → bB | ε
Functional Paradigm!
models a computation as a collection of mathematical functions. - input = domian -output = range Functional composition Recursion
what is naming?
named entities such as variables. types, fucntions, parameters, classes, objects are bound in running program to scope, visibility, type and lifetime
paradigms
pattern of problemsolving thought that underlies a particular genre of programs and language 4 main: imperative, object-oriented, functional, logic (declarative)
Compiler
produces machine code
Give grammer of language: aⁿ bⁿ c^m , n ≥ 1 m≥ 0
same number of a and b and any number of c any # of c --> B → cB | ε ( include ε b/c zero, if 1 then c ) same # a and b ---> A → aAb | ab S→ AB A → aAb | ab B → cB | ε
Ambiguous Parse of 5-4+3 using grammer Expr -> Expr Op Expr | ( Expr ) | Integer Op -> + | - | * | / | % | **
see powerpoint
Parse of 4**2**3+5*6+7 Expr -> Expr + Term | Expr - Term | Term Term -> Term * Factor | Term / Factor | Term % Factor | Factor Factor -> Primary ** Factor | Primary Primary -> 0 | ... | 9 | ( Expr )
see powerpoint
Parse of the String 5 - 4 +3 Expr → Expr + Term | Expr - Term | Term Term → 0 | ... | 9 | ( Expr )
see powerpoint
Abstract Syntax Tree for z = x + 2*y
see powerpoint lecture 3
Arithmetic Expression Grammar
the language of arithmetic expressions with 1-digit integers, addition, and subtraction. Expr → Expr + Term | Expr - Term | Term Term → 0 | ... | 9 | ( Expr )
Give grammer of language: aⁿ b^m , n≥ m
to generate more a's : A→ aA | ε to generate ab , aab : aAb to generate same # of a and b (aⁿ bⁿ): S→ aSb | ε so.. S → aAb | aSb A → aA | ε
Semantic Analysis
• Check that all identifiers are declared • Perform type checking • Insert implied conversion operators (i.e., make them explicit)
