prog lang
derivation of 352 as an integer
(right most) integer -> integer Digit -> integer 2 -> integer Digit 2 ->integer 5 2 ->digit 5 2 -> 3 5 2
Programming Domains
- Scientific applications: -large num of floating pt computations, use of arrays -fortran - Business applications: -Produce reports, use decimal numbers and characters -COBOL - Artificial Intelligence -Symbols rather than numbers manipulated; use of linked list -LISP - Systems Programming - need efficiency because of continuous use - C - Web Software -Eclectic collection of languages: markup (HTML), scripting (PHP), general-purpose (java)
LL Parsers
-An LL parse is a left-to-right, leftmost derivation. That is, we consider the input symbols from the left to the right and attempt to construct a leftmost derivation. -LL parsers begin at the start symbol and try to apply productions to arrive at the target string -TOP down
LR Parser
-An LR parse is a left-to-right, rightmost derivation, meaning that we scan from the left to right and attempt to construct a rightmost derivation. -LR parsers begin at the target string and try to arrive back at the start symbol -BOTTOM up
Language Paradigms: Object-Oriented - Java
-An OO Program is a collection of objects that interact by passing messages that transform the state. -When studying OO, we learn about: -- Sending Messages -- Inheritance -- Polymorphism Examples: Smalltalk, Java, C++, C#, and Python
difference between compiling and interpreting
-Compilation is a technique whereby a program written in one language (the "source language") is translated into a program in another language (the "object language"), which hopefully means the same thing as the original program. -Interpretation is a technique whereby another program, the interpreter, performs operations on behalf of the program being interpreted in order to run it.
Design Constraints
-Computer architecture -Technical setting -Standards -Legacy systems
Denoational Semantics
-Concerned with giving mathematical models of programming languages. -Meanings for program phrases defined abstractly as elements of some suitable mathematical structure -Machine is gone -Language is mathematics (lamda calculus) Syntax [[−]]−→ Semantics Recursive program → Partial recursive function Boolean circuit → Boolean function P→ [[P]] B[[true]] = λs ∈ State. true
Levels of abstraction in computing
-Data --Programmer-defined types/classes --Class libraries -Procedural --Programmer-defined functions --Standard function libraries
Operational Semantics
-Describe meaning of program by executing its statements on a machine, simulated or actual. -Change in the state of the machine (mem,reg,etc.) defines the meaning of the statement e maps to some n. ex 1+2*3 e1 e2 -------- e1,e2->n1+n2 e1 e2 -------- e1,e2->n1*n2
formal syntax vs syntax analysis
-Formal grammars are a tool for syntax, not semantics. We worry about semantics at a later point in the compiling process. -In the syntax analysis phase, we verify structure, not meaning.
Language Paradigms: functional - Lisp
-Functional programming models a computation as a collection of mathematical functions. --Input = domain --Output = range -Functional languages are characterized by: --Functional composition --Recursion -Example functional languages: --Lisp, Scheme, ML, Haskell, ...
Compilation process phases
-Lexical analysis: converts characters in the source program into lexical units -syntax analysis: transforms lexical units into parse trees which represent the syntactic structure of program -semantics analysis: generate intermediate code -code gen: machine code is generated
Language Paradigms: Logical (declarative) - Prolog
-Logic programming declares what outcome the program should accomplish, rather than how it should be accomplished. -When studying logic programming we see: --Programs as sets of constraints on a problem --Programs that achieve all possible solutions --Programs that are nondeterministic -Example logic programming languages: Prolog
Backus Naur Form (BNF) grammar
-Stylized version of a context-free grammar -First used to define syntax of algol 60 -now used to define syntax of most major languages class-specifier → class identifier [ : identifier ] {member-declaration {member-declaration} } member-declaration→ identifier identifier function-definition | int identifier function-definition
semantic analysis in compilers and interpreters
-Syntax analyzer will just create parse tree. -Semantic Analyzer will check actual meaning of the statement parsed in parse tree. -Semantic analysis can compare information in one part of a parse tree to that in another part (e.g., compare reference to variable agrees with its declaration, or that parameters to a function call match the function definition).
Lexical Analyzer Generation
-Write a formal description of the tokens and use a software tool that constructs a table-driven lexical analyzers given such a description -Design a state diagram that describes the tokens and write a program that implements the state diagram -Design a state diagram that describes the tokens and hand-construct a table-driven implementation of the state diagram
Attribute Grammars
-a context-free grammar G = (S,N,T,P) S=start symbol N=non terminals T=Terminals S=production or sets -for each grammar symbol x there is a set A(x) for attribute values -Each rule has a set of functions that define certain attributes of the nonterminals in the rule -each rule has a (possibly empty) set of predicates to check for attribute consistency
Turing Machines: def, 3 parts, steps.
-a finite automaton equipped with an infinite tape as its memory -tape begins with the input to the machine written on it, surrounded by infinitely many blank cells -machine has tape head that can read and write a single memory cell at a time 3 parts: -finite state control that issues commands -infinite tape for input and scratch space -tape head that can read and write a single tape cell at each step: -writes a symbol to the tape cell under the tape head -changes state -moves tape head to left or right
Finite State Machines
-a mathematical model of computation used to design both computer programs and sequential logic circuits. -It is conceived as an abstract machine that can be in one of a finite number of states. -EX: automatic door
Grammars
-a set of production rules for strings in a formal language. -The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax. -A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form.
syntactical analysis in compilers and interpreters
-also know as parser phase -purpose is to recognize source struct -input: tokens -output: parse tree or abstract syntax tree
Concrete Syntax
-based on a parse of its tokens ; is a statement terminator -rule for if statement is ambiguous: "The else ambiguity is resolved by connecting an else with the last encountered else less if."
Lambda expressions
-describe nameless functions -expressions are applied to parameters by placing the parameters after the expression e.g. (lambdaX . X x X x X) 2 -> 2x2x2 -> 8 beta reduction- process of evaluating a lambda expression redex- a lambda expression which may be reduced normal form- a lambda expression which may not be further reduced
Ambiguous Grammars
-grammar is ambiguous if one of its strings has two or more different parse trees
Parse Trees
-graphical representation of a derivation -Each internal node of the tree corresponds to a step in the derivation -each child of a node represents a right-hand side of a production -each leaf node represents a symbol of the derived string, reading from left to right
lexical analysis in compilers and interpreters
-is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of tokens (strings with an identified "meaning"). -A program that performs lexical analysis may be called a lexer, tokenizer,[1] or scanner (though "scanner" is also used to refer to the first stage of a lexer)
Compiler
-produces machine code -Translate high-level program (source language) into machine code (machine language) -Slow translation, fast execution ex: source program -> compiler -> target program
finite automaton contents
-set of states (one start state and one or more final states) -alphabet of symbols - state transition function which defines how the automaton moves from one state to another on a symbol in the alphabet
pure interpretation
-source program and input -> interpreter -> output -source program and input -> software interpreter ->hardware interpreter -> output
drawing a syntax tree
-tree representation of the abstract syntactic structure of source code written in a programming language. -Each node of the tree denotes a construct occurring in the source code.
Lexical Syntax
-uses greedy algorithm -input: a stream of chars from the ASCII set, keyed by a programmer -output: a stream of tokens or basic symbols: -Identifiers: stack, x, i, push -literals: 123,'x',3.25,true -keywords: bool, char, else, false, float, if, int, main, true, while -operators: =,||,&&,==,!=,<,<=,>,>=,+,-,*,/,! -punctuation: ;, , , {},()
pseudo code for jflex recursive descent parser
/** * This procedure corresponds to the L non-terminal * L -> 'end' * L -> ';' S L */ public void L() { if currentToken is end { //found 'end' token, get the next token in the input stream //Notice, there are no other non-terminals or terminals after //'end' in this production so all we do now is return //Note: that here we return to the procedure that called L() getNextToken(); return; } else if(currentToken == ';') { //we found an ';', get the next token in the input stream getNextToken(); //Notice that there are two more non-terminal symbols after ';' //in this production, S and L; all we have to do is call those //procedures in the order they appear in the production, then return S(); L(); return; } else { //The currentToken is not either an 'end' token or a ';' token //This is an error, raise some exception throw new IllegalTokenException( "Procedure L() expected an 'end' or ';' token "+ "but received: " + currentToken); } }
Hybrid Implementation Systems
A compromise between compilers and pure interpreters source -> compiler -> intermediate program and input -> virtual machine -> output
BNF vs Extended BNF (EBNF)
BNF: -recursion for iteration -nonterminals for grouping EBNF: - {} for a series of zero or more - () for a list, must pick one - [] for an optional list; pick none or one
abstract syntax tree
Binary node op, term1, term2
Language Paradigms: Imperative - C
Follows classic von Neumann-Eckert model: -program and data are indistinguishable in memory -program= seq of commands -state=values of all variables when program runs -large programs use procedural abstraction examples: cobol, fortran, c, ada, perl
context free Grammar rules: <expression> --> number <expression> --> ( <expression> ) <expression> --> <expression> + <expression> <expression> --> <expression> - <expression> <expression> --> <expression> * <expression> <expression> --> <expression> / <expression> derive: (number) + number * number
In our grammar for arithmetic expressions, the start symbol is <expression>, so our initial string is: <expression> Using rule 5 we can choose to replace this nonterminal, producing the string: <expression> * <expression> We now have two nonterminals to replace. We can apply rule 3 to the first nonterminal, producing the string: <expression> + <expression> * <expression> We can apply rule two to the first nonterminal in this string to produce: (<expression>) + <expression> * <expression> If we apply rule 1 to the remaining nonterminals (the recursion must end somewhere!), we get: (number) + number * number
Grammar models of syntax
Lexical syntax-rules for basic symbols(names, values, operators, etc) Concrete syntax-actual representation Abstract syntax-only the essential info because it ignores some details
difference between lexical, concrete and abstract syntax
Lexical syntax-rules for basic symbols(names, values, operators, etc) Concrete syntax-actual representation Abstract syntax-only the essential info because it ignores some details
lexical analysis
Purpose: transform program representation Input: printable ascii characters output: tokens discard: whitespace, comments
Abstract Syntax
Removes "syntactic sugar" and keeps essential elements of a language. ex: x := a+b; y := a*b; while (y>a){ a := a+1; x := a+b; } only essential info is 1. that is is a loop 2. that its terminating condition is i<n 3. that its body increments the current value of i
steps of compilation
Source program preprocessing ¯ Source program compiling ¯ Assembly program assembling ¯ Machine instructions linking ¯ Executable code
difference between operational and denotational semantics
The difference between denotational and operational semantics: -In operational semantics, the state changes are defined by coded algorithms for a virtual machine -In denotational semantics, they are defined by rigorous mathematical functions
drawing a parse tree
Tree nodes represent symbols of the grammar (nonterminals or terminals) and tree edges represent derivation steps.
associativity and precedence
Unary - ! no associativity * / left associativity + - left associativity < <= > >= no associativity == != no associativity && left associativity || left associativity
Syntax
a precise description of all its grammatically correct programs. Ex: grammar for language, basic vocabulary, how syntax errors are detected.
Token name
category of lexical unit (keyword, identifier, integer, operator, etc)
BNF grammar
class-specifier → class identifier [ : identifier ] {member-declaration {member-declaration} } member-declaration→ identifier identifier function-definition | int identifier function-definition production has the form: A-> ω where A is element of N and ω is element of (N union T)*
structure of jflex prog
declarations %% translation rules %% auxiliary functions declarations include variables and manifest constants translation rules are of the form Pattern {Action}
pattern
description of the form that the lexemes may take
Interpreter
executes instructions on a virtual machine ex: input -> hardware interpreter -> output
binding
is the association between an object and a property of that object Examples: -a variable and its type -a variable and its value -Early binding takes place at compile-time -Late binding takes place at run time
Lexical Analyzer Generation in C
lex source program lex.l -> lex compiler -> outputs lex.yy.c -> c compiler -> a.out input stream -> a.out -> sequence of tokens
front end of a compiler
lexical analyzer-> syntax analyzer -> semantic analyzer -> intermediate code generator -> code optimization -> code generator lexical analyzer sends id's -> syntax analyzer that sends parse tree -> semantic analyzer sends parse tree/confirms correct semantics -> intermediate code gen sends translated parse tree -> code optimizer -> code gen
Von Neumann Architecture
parts consisting of: - a processing unit containing an arithmetic logic unit and processor registers; -a control unit containing an instruction register and program counter; -a memory to store both data and instructions; -external mass storage; -input and output mechanisms
program paradigms
patter of problem solving thought that underlies a particular genre of programs and languages. 4 main: -imperative -object-oriented -functional -logic(declarative)
pseudo code for jflex token specification
program -> int main () {declaration statements} declarations -> {declaration} declaration -> Type identifier [[integer]]{,identifier[[integer]]} type-> int | bool | float | char statements -> {statement} statement-> ; | block | assignment | ifStatement | whileStatment block-> {statements} assignment -> Identifier [[Expression]] = expression; ifstatement -> if(expression) statement [else statment] whileStatement -> while (expression) statement
jflex token class pseudo code
public enum token class{ EOF, // keywords BOOL, CHAR, ELSE, FLOT, IF, INT, MAIN,WHILE, //punctuation COMMA, SEMICOLON,LBRACE,RBRACE, // operators LPAREN, RPAREN, LBRACK, RBRACK, ASSIGN, OR, AND, EQ, NE, LT, LE, GT, GE, PLUS, MINUS, TIMES, SLASH, MOD, NOT, // ids andl iterals ID, INTEGER, BOOLEAN, FLOATLIT, CHARLIT }
regular expressions vs lexical tokens
regex: b = 2 + a*10 lexical tokens: ['b', '=', '2', '+', 'a', '*', '10']
parse tree derivations
right most derivation(deriving from right) left most derivation(deriving from left)
Lexemes
sequence of characters making up a token instance
The compilation process
source program -> lexical analyzer sends lexical units -> syntax analyzer sends parse tree -> intermediate code generator(and semantic analyzer sends intermediate code -> code generator sends machine language -> computer(also sent input data) that returns results -Lexical and syntax analyzers also fill symbol table -symbol table sends -> intermediate code gen and code gen -optimization optional between intermediate and code gen.
Token attribute value
specific representation (if, myVar, 602, +, etc.)
Lexical analyzer generation in java
specification of tokens(clite.jflex)-> jflex -> lexical analyzer class (cliteLexer.java)
Just-in-Time Implementation Systems
translate programs -> intermediate lang -> compile intermediate lang of subprogram -> machine code that is called(kept for subsequent calls) -JIT systems widely used for java and .net languages implemented with JIT
Chomsky Hierarchy
type 0 - Recursively enumerable Type 1-Context-sensative Type 2 - Context-free Type 3 - regular