Programming Languages Midterm

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Rule for IfStatement is ambiguous:

"The else ambiguity is resolved by connecting an else with the last encountered else-less if."

Second generation languages

- Assembly languages. -Symbolic opeartion codes replaced binary operation codes. -Assembly language programs needed to be 'assembled' for execution by the computer. -Each assembly language instruction is translated into one machine language instruction. -Very efficient code and easier to write.

Fifth generation languages

- Declarative lannguages . -Functional(?) : List, SML. - Also called applicative : everything is a function. Example : Prolog. - Based on mathematical logic. - Rule- or Constraint-based.

What are the types of Programming languages ?

- First generation languages. Machine language : Operation code - Such as addition or substraction. Operands - that identify the data to be processed. Machine language is machine dependent as it is the only language the computer can understand. It is very efficient code but ( Very Difficult to write )

principles of PL ( 4 properties)

- Syntax -Naming -Types -Semantics

Why are there so many programming languages ?

- same theoretical power - different practical power -Different programming language are designed for different types of programs.

Fourth generation languages

-A high level generation that requires fewer instructions to accomplish a task than a third generation language. -Used with databases : - Query langauges. - Report generators. - Forms designers. - Application generators.

A programming language is :

-A notational system for describing computation in a machine-readable and human-readable form. -A tool for developping executable models for a class of problem domains.

What is a Programming Language ?

-A programming language is a set of rules that provides a way of telling a computer what operations to perform. -A programming language is a set of rules for communicating an algorithm.

Third generation languages

-Closer to English but included simple mathematical notation. -Programs written in source code which must be translated into machine language programs called object code. -The translation of source code to object code is accomplished by machine language system program called Compiler. -Alternative to compilation is interpretation which is accomplished by a system program called an Interpreter. examples : Visual Basic, Fortran , C and C++, Java.

Programming Language has a set of syntax rules

-English is a natural language, it has word, symbols and grammatical rules. -A programming language also has words, symbols and rules of grammar. -The grammatical rules are called syntax. -Each programming language has a different set of syntax rules.

Why Study PLs?

-Help Choose a language - make it easier to learn new languages - help make better use of whatever language you use (implementation cost, obscure feactures)

Levels of programming languages

1-High level program. 2-Low level program. 3-Executable Machine code.

configuration

A configuration on an fsa consists of a state and the remaining input.

Deterministic FSA

A finite state automaton is deterministic if for each state and each input symbol, there is at most one outgoing arc from the state labeled with the input symbol.

Associativity and Precedence

A grammar can be used to define associativity and precedence among the operators in an expression. E.g., + and - are left-associative operators in mathematics; * and / have higher precedence than + and -

Clarity about Binding

A language element is bound to a property at the time that property is defined for it. So a binding is the association between an object and a property of that object ex: a varible and its type

move

A move consists of traversing the arc exiting the state that corresponds to the leftmost input symbol, thereby consuming it. If no such arc, then: - If no input and state is final, then accept. - Otherwise, error. An input is accepted if, starting with the start state, the automaton consumes all the input and halts in a final state.

Give language of grammer: S→ aAb | aBb | aSb A → aA | a B → bB | b

A → aA | a as we have seen generates any number of a's ---> a⁺ B → bB | b as we have seen generates any number of b's ---> b⁺ so either generates more a than b ( a> b) or more b than a ( b> a) but not equal so language: aⁿ b^m, where n≠ m

Give language of grammer: S→ aSb | aAb A → aA | ε

A → aA | ε as we have seen generates any number of a's --> a⁺ generates more a's than b's language: aⁿ b^m, where n> m

Language Support

Accessible (public domain) compilers/interpreters • Good texts and tutorials • Wide community of users • Integrated with development environments (IDEs)

Context-free Grammars

BNF a stylized form of CFG Equivalent to a pushdown automaton For a wide class of unambiguous CFGs, there are table-driven, linear time parsers

Extended BNF (EBNF)

BNF: - recursion for iteration - nonterminals for grouping EBNF: additional metacharacters - { } for a series of zero or more - ( ) for a list, must pick one - [ ] for an optional list; pick none or one

Parser

Based on BNF/EBNF grammar • Input: tokens • Output: abstract syntax tree (parse tree) • Abstract syntax: parse tree with punctuation, many nonterminals discarded

Concrete Syntax

Based on a parse of its Tokens ; is a statement terminator

derivations

Consider the grammar: Integer → Digit | Integer Digit Digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 We can derive any unsigned integer, like 352, from this grammar.

Abstraction in Programming!

Data - Programmer-defined types/classes - Class libraries Procedural - Programmer-defined functions - Standard function libraries

regular grammer example:

Example Integer → 0 Integer | 1 Integer | ... | 9 Integer | 0 | 1 | ... | 9

Binary Digits

Example of BNF consider grammer: binaryDigit → 0 | 1 the | is a metacharacter that separates alternatives.

EBNF Examples

Expression is a list of one or more Terms separated by operators + and - Expression -> Term { ( + | - ) Term } IfStatement -> if ( Expression ) Statement [ else Statement ] C-style EBNF lists alternatives vertically and uses opt to signify optional parts. E.g., IfStatement: if ( Expression ) Statement ElsePartopt ElsePart: else Statement

Imperative paradigm

Follows the classic von Neumann-Eckert model: - Program and data are indistinguishable in memory - Program = a sequence of commands - State = values of all variables when program runs - Large programs use procedural abstraction

Example Tokens

Identifiers Literals: 123, 5.67, 'x', true Keywords: bool char ... Operators: + - * / ... Punctuation: ; , ( ) { }

Lexical Syntax

Input: a stream of characters from the ASCII set, keyed by a programmer. Output: a stream of tokens or basic symbols, classified as follows: - Identifiers e.g., Stack, x, i, push - Literals e.g., 123, 'x', 3.25, true - Keywords bool char else false float if int main true while - Operators = || && == != < <= > >= + - * / ! - Punctuation ; , { } ( )

Lexer

Input: characters Output: tokens • Separate: - Speed: 75% of time for non-optimizing compilers - Simpler design - Character sets - End of line conventions

Generators

Input: usually regular expression Output: table (slow), code C/C++: Lex, Flex Java: JLex

Parse tree for 352 as an Integer

Integer / \ int digit / \ | Int Dig 2 | | dig 5 | 3

Work out how to derive 352 from the rightmost deviation grammer Integer → Digit | Integer Digit Digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Integer ⇒ Integer Digit ⇒ Integer 2 ⇒ Integer Digit 2 ⇒ Integer 5 2 ⇒ Digit 5 2 ⇒ 3 5 2

Work out how to derive 352 from the leftmost deviation grammer Integer → Digit | Integer Digit Digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Integer ⇒ Integer Digit ⇒ Integer Digit Digit ⇒ Digit Digit Digit ⇒ 3 Digit Digit ⇒ 3 5 Digit ⇒ 3 5 2

two types of parsers

LL & LR

LL & LR Grammars

LL < LR: Some LR grammars cannot be parsed by LL parsers.

LL Parser

LL: left-to-right, leftmost-derivation left-to-right: consume the input from the left to right.

LR Parser

LR: left-to-right, rightmost-derivation left-to-right: consume the input from the left to right.

Levels of Syntax

Lexical syntax = all the basic symbols of the language (names, values, operators, etc.) Concrete syntax = rules for writing expressions, statements and programs. Abstract syntax = internal representation of the program, favoring content over form.

what is sementics ?

Meaning of a language

what type of grammer is this ? A = a A b | ε

Not a regular language { aⁿ bⁿ | n ≥ 1 } i.e., cannot balance: ( ), { }, begin end A = a B b B = a B b | ε

Syntactic Analysis

Phase also known as: parser Purpose is to recognize source structure Input: tokens Output: parse tree or abstract syntax tree A recursive descent parser is one in which each nonterminal in the grammar is converted to a function which recognizes input derivable from the nonterminal.

Context-Sensitive Grammars

Production: α → β |α| ≤ |β| left length side less than right α, β ∈ (N ∪ T)* i.e., lefthand side can be composed of strings of terminals and nonterminals

program structure of syntactic analysis

Program Structure consists of: Expressions: x + 2 * y Assignment Statement: z = x + 2 * y Loop Statements: while (i < n) a[i++] = 0; Function definitions Declarations: int i;

Lexical Analysis

Purpose: transform program representation Input: printable ASCII characters Output: tokens Discard: whitespace, comments Defn: A token is a logically cohesive sequence of characters representing a single symbol

Chomsky Hierarchy

Regular grammar -- least powerful Context-free grammar (BNF) Context-sensitive grammar Unrestricted grammar

Abstract Syntax

Removes "syntactic sugar" and keeps essential elements of a language. E.g., consider the following two equivalent loops: Pascal while i < n do begin i := i + 1; end; C/C++ while (i < n) { i = i + 1; } Theonly essential information in each of these is 1) that it is a loop, 2) that its terminating condition is i < n, and 3) that its body increments the current value of i.

Interpreter

Replaces last 2 phases of a compiler Input: - Mixed: intermediate code - Pure: stream of ASCII characters

Give grammer of language: aⁿ b^m n ≥ 1, m ≥ 0

S → AB A → aA | a B → aB | ε

Identifier

Sequence of letters and digits, starting with a letter if is both an identifier and a keyword Most languages require identifiers to be distinct from keywords In some languages, identifiers are merely predefined (and thus can be redefined by the programmer) **Redefining Identifiers can be dangerous

BNF Grammar

Set of productions: P terminal symbols: T nonterminal symbols: N start symbol: S ∈ N A production has the form A --> ω where A ∈ N and ω ∈ (N ∪T)

Finite State Automata

Set of states: representation - graph nodes Input alphabet + unique end symbol State transition function Labelled (using alphabet) arcs in graph Unique start state One or more final states

Why a Separate Phase of lexical?

Simpler, faster machine model than parser 75% of time spent in lexer for non-optimizing compiler Differences in character sets End of line convention differs

Regular Grammar

Simplest; least powerful Equivalent to: - Regular expression - Finite-state automaton right regular grammar: equivalent A → ω B A → ω Used in construction of tokenizers Less powerful than context-free grammars

Criteria in a good language design

Simplicity and Readabillity. Clarity about binding Reliability. Support Abstraction Orthogonality. Efficient implementation

What is syntax?

Structure and form (mechanics)

Backus-Naur Form (BNF)

Stylized version of a context-free grammar (cf. Chomsky hierarchy) • Sometimes called Backus Normal Form • First used to define syntax of Algol 60 • Now used to define syntax of most major languages

Reliability

The quality of a language that assures a program will not behave in unexpected or disastrous ways during execution reliable if: - Program behavior is the same on different platforms • E.g., early versions of Fortran - Type errors are detected • E.g., C vs Haskell - Semantic errors are properly trapped • E.g., C vs C++ - Memory leaks are prevented

Orthogonality

The quality of a language that features provided have as few restrictions as possible and be combinable in any meaningful way orthogonal if its features are built upon a small, mutually independent set of primitive operations. • Fewer exceptional rules = conceptual simplicity - E.g., restricting types of arguments to a function • Tradeoffs with efficiency

Simplicity and Readabillity.

The quality that enables a programmer to understand and comprehend the nature of a computation easily and accurately - ease of learning =ease of programming

Efficent Implementation

The quality that provides a translator or interpreter can be written, this can address to complexity to the language definition Embedded systems - Real-time responsiveness (e.g., navigation) - Failures of early Ada implementations • Web applications - Responsiveness to users (e.g., Google search) • Corporate database applications - Efficient search and updating • AI applications - Modeling human behaviors

Finding a More Efficient Tree

The shape of the parse tree reveals the meaning of the program. So we want a tree that removes its inefficiency and keeps its shape. - Remove separator/punctuation terminal symbols - Remove all trivial root nonterminals - Replace remaining nonterminals with leaf terminals

Object-oriented (OO) Paradigm

a collection of objects that interact by passing messages that transform the state. -Sending Messages - Inheritance - Polymorphism ex: Smalltalk, Java, C++, C#, and Ruby

what is types?

a collection of values and a collection of opeations on those values - simple types: numbers, characters, booleans - structural types: strings, lists, hash tables, trees

Metalanguage

a language used to define other languages

Grammer

a metalanguage used to define the syntax of a language

Give language of grammer: S→ aSb | ε

always generates an b for every a generated {ε, ab ,aabb, aaabbb} Language: aⁿbⁿ n≥ 0 ( zero needed bc needs to generate ε )

Ambiguous grammers

ambiguous if one of its strings has two or more diffferent parse trees.

EBNF to BNF

can always rewrite an EBNF grammar as a BNF grammar. E.g., A -> x { y } z can be rewritten: A -> x A' z A' -> | y A' (Rewriting EBNF rules with ( ), [ ] is left as an exercise.)

Give language of grammer: S→ aS | bS | ε

can generate any combination of a and b with minimum string ε so language: ( a + b)* use * if ε (empty string) is included in grammer!! + mean concatination

Give grammer of language: aⁿ b^m where ( n +m) is even

case 1: (even + even) = even → (aa)* + (bb)* case 2: (odd + odd) = even → a(aa)* + b(bb)* using previous knowledge A → aaA | ε give even number of a's so S → AB | aAbB A → aaA | ε B → bbB| ε now type if ( n+m) is odd

logical paradigm

declares what outcome the program should accomplish, rather than how it should be accomplished. - Programs as sets of constraints on a problem - Programs that achieve all possible solutions - Programs that are nondeterministic

Interpreter

executes instructions on a virtual machine

Give grammer of language: starting and ending with different symbol a ( a+b)* b | b ( a+b)* a | a | b

for any combination of a and b --> ( a + b) * : A → aA | bA | ε so .. S→ aAb | bAa | a | b A → aA | bA | ε

Give grammer of language: (aa) *

generates even number of a's and includes ε A → aaA | ε Note: (a)* any number of a's (aa)* even # of a's a(aa) * odd # of a's

Give grammer of language: aⁿ bⁿ , n ≥ 0

generates same number of a' s and b's S → aSb | ε **when an a is generated it also generated a b so equal #s of a and b are produced

Give grammer of language: aⁿ n ≥ 1

generating any number of a's A → aA | a

Give grammer of language: bⁿ n ≥ 0

generating empty string and any nymber of b's B → aB | ε

Relationships shown by the structure of the parse tree

highest precedence at the bottom, and left-associativity on the left at each level.

Parse Trees

is a graphical representation of a derivation. Each internal node of the tree corresponds to a step in the derivation. The children of a node represents a right-hand side of a production. Each leaf node represents a symbol of the derived string, reading from left to right.

Whitespace

is any space, tab, end-of-line character (or characters), or character sequence inside a comment No token may contain embedded whitespace (unless it is a character or string literal) Example: >= one token > = two tokens

Give language of grammer: S→ aA | a

mimum string a generates any number of a language { a⁺ } ⁺ b/c doesn't generate ε S→ bS| ε generates any number of b and ε so use * {b* }

Give language of grammer: S→ aS | bS | a | b

minimum string a or b ( not ε) {a,b,bb,ab,aa,abb,aab} language: (a +b)⁺ use ⁺ b/c no ε

Give grammer of language: aⁿ b^m , n ≥ 3 , m ≥2

minimum string is aaabb so have to be able to generate the minimum S→ aaaAbbB A → aA | ε B → bB | ε

Functional Paradigm!

models a computation as a collection of mathematical functions. - input = domian -output = range Functional composition Recursion

what is naming?

named entities such as variables. types, fucntions, parameters, classes, objects are bound in running program to scope, visibility, type and lifetime

paradigms

pattern of problemsolving thought that underlies a particular genre of programs and language 4 main: imperative, object-oriented, functional, logic (declarative)

Compiler

produces machine code

Give grammer of language: aⁿ bⁿ c^m , n ≥ 1 m≥ 0

same number of a and b and any number of c any # of c --> B → cB | ε ( include ε b/c zero, if 1 then c ) same # a and b ---> A → aAb | ab S→ AB A → aAb | ab B → cB | ε

Ambiguous Parse of 5-4+3 using grammer Expr -> Expr Op Expr | ( Expr ) | Integer Op -> + | - | * | / | % | **

see powerpoint

Parse of 4**2**3+5*6+7 Expr -> Expr + Term | Expr - Term | Term Term -> Term * Factor | Term / Factor | Term % Factor | Factor Factor -> Primary ** Factor | Primary Primary -> 0 | ... | 9 | ( Expr )

see powerpoint

Parse of the String 5 - 4 +3 Expr → Expr + Term | Expr - Term | Term Term → 0 | ... | 9 | ( Expr )

see powerpoint

Abstract Syntax Tree for z = x + 2*y

see powerpoint lecture 3

Arithmetic Expression Grammar

the language of arithmetic expressions with 1-digit integers, addition, and subtraction. Expr → Expr + Term | Expr - Term | Term Term → 0 | ... | 9 | ( Expr )

Give grammer of language: aⁿ b^m , n≥ m

to generate more a's : A→ aA | ε to generate ab , aab : aAb to generate same # of a and b (aⁿ bⁿ): S→ aSb | ε so.. S → aAb | aSb A → aA | ε

Semantic Analysis

• Check that all identifiers are declared • Perform type checking • Insert implied conversion operators (i.e., make them explicit)


Set pelajaran terkait

OWN Collection of Blood and Non-Blood Specimens

View Set

Human Population Dynamics Part 2

View Set

GRE Literature in English Complete

View Set

QM-Exam 1 - Minimester - Study Guide

View Set