CSCI 4200 Programming Language Test 1

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

FORTRAN

What programming language has dominated scientific computing over the past 60 years or so?

Others

a. Portability b. Generality c. Well-definedness

Recent Variations in EBNF

• Alternative RHSs are put on separate lines • Use of a colon : instead of -> • Use of opt for optional parts • Use of oneof for choice

Parse Tree

• Parse tree is a hierarchical representation of a derivation • Every internal node of a parse tree is labeled with a nonterminal symbol • Every leaf is labeled with a terminal symbol

Artificial intelligence

- Symbols rather than numbers manipulated; use of linked lists - LISP

Character String Type in Certain Languages

C and C++ - Not primitive - Use char arrays and a library of functions that provide operations SNOBOL4 (a string manipulation language) - Primitive - Many operations, including elaborate pattern matching Fortran and Python - Primitive type with assignment and several operations Java - Primitive via the String class Perl, JavaScript, Ruby, and PHP - Provide built-in pattern matching, using regular expressions

Generators

- A device that generates sentences of a language - One can determine if the syntax of a particular sentence is syntactically correct by comparing it to the structure of the generator

Two(2) parts of 'Syntax Analysis'

- A low-level part called a lexical analyzer (mathematically, a finite automaton based on a regular grammar) - A high-level part called a syntax analyzer , or parser (mathematically, a push-down automaton based on a context-free grammar, or BNF)

Recognizers

- A recognition device reads input strings over the alphabet of the language and decides whether the input strings belong to the language - Example: syntax analysis part of a compiler

a State Diagram consists of

- A set of states • Start state is indicated by an arrow • Final/Accept state is a double-lined circle - A set of inputs - A set of transitions

Context-Free Grammars (CFG)

- Developed by Noam Chomsky in the mid-1950s - These language generators meant to describe the syntax of natural languages - Define a class of languages called context-free languages (CFL) •That is, languages generated by context-free grammars are known as context-free languages.

Web Software

- Eclectic collection of languages: markup (e.g., HTML), scripting (e.g., PHP), general-purpose (e.g., Java)

Writability (flexibility) vs. reliability

- Example: C++ pointers are powerful and very flexible but are unreliable

Reliability vs. cost of execution

- Example: Java demands all references to array elements to be checked for proper indexing, which leads to increased execution costs

Some Language Categories: "Declarative"

- Focuses on what the program should accomplish without specifying how the program should achieve the result. - Functional programming languages (described next) are considered declarative.

Backus-Naur Form (BNF)

- Invented by John Backus to describe the syntax of Algol 58 (1959) - BNF is equivalent to context-free grammars

Scientific applications

- Large numbers of floating point computations; use of arrays - Fortran

Some Language Categories: Functional

- Main means of making computations is by applying functions to given parameters - Examples: LISP, Scheme, ML, F#

Some Language Categories: Markup/programming hybrid

- Markup languages extended to support some programming - Examples: JSTL, XSLT

Systems programming

- Need efficiency because of continuous use - C

Hybrid Implementation System examples:

- Perl programs are partially compiled to detect errors before interpretation - Initial implementations of Java were hybrid; the intermediate form, bytecode , provides portability to any machine that has a byte code interpreter and a run-time system (together, these are called Java Virtual Machine )

Business applications

- Produce reports, use decimal numbers and characters - COBOL

•Most important criteria for evaluating programming languages include:

- Readability, writability, reliability, cost

Some Language Categories: Logic

- Rule-based (rules are specified in no particular order) - Example: Prolog

Some Language Categories: "Imperative"

- Use statements to change a program's state. - Imperative programming focuses on describing how a program operates. - Central features are variables, assignment statements, and iteration - Include •object-oriented languages •scripting languages •visual languages - Examples: C, Java, Perl, JavaScript, Visual BASIC .NET, C++

Lexical Analysis' Convenient utility subprograms:

- getChar - gets the next character of input, puts it in the global variable nextChar, determines its class and puts the class in the global variable charClass. charClass is either a LETTER or DIGIT - addChar - puts the character from nextChar into the place the lexeme is being accumulated, lexeme. Lexeme could be implemented as a character string or an array. - lookup - determines the token code corresponding to single character lexemes such as '(', '+', '/', etc

Compilation process has several phases:

- lexical analysis: converts characters in the source program into lexical units - syntax analysis: transforms lexical units into parse trees which represent the syntactic structure of program - Semantics analysis: generate intermediate code - code generation: machine code is generated

Data Type: Uses of the Type System

-Error detection- •Type checking -Assistance it provides for program modularization- •Cross-module type checking that ensures the consistency of the interfaces among modules -Program document information about its data- •Provides clues about the program's behavior

BNF and context

-free grammars are equivalent meta-languages -Well-suited for describing the syntax of programming languages

Implementation Methods

1. Compilation 2. Pure Interpretation 3. Hybrid Implementation Systems 4. Just-in-Time (JIT) Implementation Systems

Influences on Language Design

1. Computer Architecture 2. Program Design Methodologies

Ambiguity in Grammars

A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees

Readability vs. writability

Example: APL provides many powerful operators (and a large number of new symbols), allowing complex computations to be written in a compact program but at the cost of poor readability

The von Neumann Architecture

Fetch-execute-cycle 1.initialize the program counter 2.repeat forever 3. fetch the instruction pointed by the counter 4. increment the counter 5. decode the instruction 6. execute the instruction 7.end repeat

Evaluation Criteria: Language design criteria are

Language design criteria are weighed differently from different perspectives.

Evaluation Criteria: Language designers are likely to emphasize

Language designers are likely to emphasize elegance and the ability to attract widespread use.

Evaluation Criteria: Language implementors are concerned primarily with the

Language implementors are concerned primarily with the difficulty of implementing the constructs and features of the language.

Evaluation Criteria: Language users are worried about

Language users are worried about writability first and readability later.

Language Evaluation Criteria

Readability Writability Reliability Cost Others

Associativity of Operators: right associative

Some operators such as exponentiation (**), if provided in the language, are evaluated from right to left. - E.g., 2 ** 3 ** 2 -- 3**2 will be evaluated first • A right recursive rule can be used to specify right associativity • In a right recursive rule, LHS also appears at the right end of its RHS • Example: <factor> → <exp> ** <factor> | <exp> <exp> → ( <expr> ) | id

Character "String" Types Operations

Typical operations: - Assignment and copying - Comparison (=, >, etc.) - Catenation - Substring reference - Pattern matching

Readability

a. Overall simplicity - A manageable set of features and constructs - Minimal feature multiplicity - Minimal operator overloading b. Orthogonality 굳이 말하자면 직관성? 인거 같다. 프로그래밍 언어가 우리가 생각해주는 대로 동작 해주면 Orthogonality가 좋은 언어라고 할 수 있겠다. c. Data types d. Syntax considerations

Writability

a. Simplicity and orthogonality b. Support for abstraction c. Expressivity

Cost

a. Training programmers to use the language b. Writing programs (closeness to particular applications) c. Compiling programs d. Executing programs e. Language implementation system f. Reliability: poor reliability leads to high costs g. Maintaining programs - Corrections and modifications to add new functionality h. Three most important contributors to cost:

Reliability

a. Type checking b. Exception handling c. Aliasing d. Readability and Writability

Recursive-Descent Parsing

based on an EBNF grammar - Detects syntax errors - Produces a parse tree • There is a subprogram for each nonterminal in the grammar, which can parse sentences that can be generated by that nonterminal • EBNF is ideally suited for being the basis for a recursive-descent parser, because EBNF minimizes the number of nonterminals

Evaluation Criteria:

characteristics often conflict with one another.

Parsing Problem: Goals of the parser,

given an input program: - Find all syntax errors; for each, produce an appropriate diagnostic message and recover quickly - Produce the parse tree, or at least a trace of the parse tree, for the program

Two categories of parsers - Bottom up -

produce the parse tree, beginning at the leaves • Begin at the target string and try to arrive back at the start symbol. • An LR parser is a type of bottom-up parser

Two categories of parsers - Top down -

produce the parse tree, beginning at the root • Begin at the start symbol and try to apply productions to arrive at the target string • An LL parser is a type of top-down parser

Data Type: "Descriptor"

the collection of the attributes of a variable • A descriptor is an area of memory that stores the attributes of a variable • Static Attributes: descriptors are build and used at compile time, usually, as a part of symbol table • Dynamic Attributes: part or all of the descriptor must be maintained during execution

Syntax

the form or structure of the expressions, statements, and program units

Semantics

the meaning of the expressions, statements, and program units

User-defined operator overloading can harm the readability of a program (T/F)

true

Reasons to Separate Lexical and Syntax Analysis

• Simplicity - less complex approaches can be used for lexical analysis; separating them simplifies the parser • Efficiency - separation allows optimization of the lexical analyzer • Portability - parts of the lexical analyzer may not be portable, but the parser always is portable

Programming Methodologies Influences

• 1950s and early 1960s: Simple applications; worry about machine efficiency • Late 1960s: People efficiency became important; readability, better control structures - structured programming - top-down design and step-wise refinement • Late 1970s: Process-oriented to data-oriented - data abstraction • Middle 1980s: Object-oriented programming - Data abstraction + inheritance + polymorphism

Programming Environments

• A collection of tools used to faciliate software development • UNIX - An older operating system and tool collection - Nowadays often used through a GUI (e.g., CDE, KDE, or GNOME) that runs on top of UNIX • Microsoft Visual Studio.NET - A large, complex visual environment - Used to build Web applications and non-Web applications in any .NET language • NetBeans - Lets you quickly and easily develop Java desktop, mobile, and web applications, as well as HTML5 applications

Hybrid Implementation Systems

• A compromise between compilers and pure interpreters • A high-level language program is translated to an intermediate language that allows easy interpretation • Faster than pure interpretation

Lexical Analysis

• A lexical analyzer is a pattern matcher for character strings • A lexical analyzer is a "front-end" for the parser • Identifies substrings of the source program that belong together - lexemes

Derivations (Continued)

• A sentence is a sentential form that has only terminal symbols • A sentence can be derived using the following algorithm: 1. String := Start Symbol 2. REPEAT 3. Choose any nonterminal in String. 4. Find a production with this nonterminal on the left-hand side. 5. Replace the nonterminal with one of the options on the right-hand side of the production. 6. UNTIL String contains only terminals. • A leftmost derivation is one in which the leftmost nonterminal in each sentential form is expanded • A derivation may be either leftmost or rightmost

The General Problem of Describing Syntax: Terminology

• A sentence is a string of characters over some alphabet • A language is a set of sentences • A lexeme is the lowest level syntactic unit of a language (e.g., *, sum, begin) • A token is a category of lexemes (e.g., identifier)

Primitive Data Types

• Almost all programming languages provide a set of primitive data types • Primitive data types: Those not defined in terms of other data types • Some primitive data types are merely reflections of the hardware • Others require only a little non-hardware support for their implementation

Recursive-Descent Parsing (continued

• Assume we have a lexical analyzer named lex, which puts the next token code in nextToken • The coding process when there is only one RHS: - For each terminal symbol in the RHS, compare it with the next input token; if they match, continue, else there is an error - For each nonterminal symbol in the RHS, call its associated parsing subprogram

BNF vs. EBNF

• BNF <expr> → <expr> + <term> | <expr> - <term> | <term> <term> → <term> * <factor> | <term> / <factor> | <factor> • EBNF <expr> → <term> {(+ | -) <term>} <term> → <factor> {(* | /) <factor>}

Von Neumann Bottleneck

• Connection speed between a computer's memory and its processor determines the speed of a computer • Program instructions often can be executed much faster than the speed of the connection; the connection speed thus results in a bottleneck • Known as the von Neumann bottleneck ; it is the primary limiting factor in the speed of computers

Derivations

• Every string of symbols, including the start symbol, derived from the start symbol is a sentential form • For example, given the previous grammar, each of the following is a sentential form - <program> - begin <stmt_list> end - begin <stmt> ; <stmt_list> end - begin A = B + C ; B = <var> end - begin A = B + C ; B = C end

BNF (more description)

• In BNF, abstractions are used to represent syntactic structures - Example: <assign>, below, is a representation of assignment statement in Java <assign> → <var> = <expression> • The above structure is called a rule, production, or production rule • A rule has a left-hand side (LHS), which is a nonterminal, and a right-hand side (RHS), which is a string of terminals and/or nonterminals • Example sentence based on the above rule: - total = subtotal1 + subtotal2

Reasons for Studying Concepts of Programming Languages

• Increased ability to express ideas • Improved background for choosing appropriate languages • Increased ability to learn new languages • Better understanding of significance of implementation • Better use of languages that are already known • Overall advancement of computing

Just-in-Time (JIT) Implementation Systems

• Initially translate programs to an intermediate language • Then compile the intermediate language of the subprograms into machine code when they are called • Machine code version is kept for subsequent calls • JIT systems are widely used for Java programs • .NET languages are implemented with a JIT system • In essence, JIT systems are delayed compilers

Additional Compilation Terminologies:

• Linking and loading: the process of collecting system program units and linking them to a user program • Load module (executable image): the user and system code together

Primitive Data Types: "Floating Point"

• Model real numbers, but only as approximations • Languages for scientific use support at least two floating-point types (e.g., float and double; sometimes more • Usually exactly like the hardware, but not always • IEEE Floating-Point Standard 754

Pure Interpretation

• No translation • Programs are interpreted by another program known as an interpreter • Used for small programs or when efficiency is not an issue • Easier implementation of programs (runtime errors can easily and immediately be displayed) • Slower execution (10 to 100 times slower than compiled programs) • Often requires more space • Now rare for traditional high-level languages • Significant comeback with some Web scripting languages (e.g., JavaScript, PHP)

Extended BNF (EBNF)

• Optional parts are placed in brackets [ ] <proc_call> -> ident [(<expr_list>)] • Alternative parts of RHSs are placed inside parentheses and separated via vertical bars <term> → <term> (+|-) const • Repetitions (0 or more) are placed inside braces { } <ident> → letter {letter|digit}

Preprocessors

• Preprocessor macros (instructions) are commonly used to specify that code from another file is to be included • A preprocessor processes a program immediately before the program is compiled to expand embedded preprocessor macros • A well-known example: C preprocessor - expands #include, #define, and similar macros

Advantages of Using BNF to Describe Syntax

• Provides a clear and concise syntax description • The parser can be based directly on the BNF • Parsers based on BNF are easy to maintain

Programming Domains (examples)

• Scientific applications • Business applications • Artificial intelligence • Systems programming • Web Software

Primitive Data Types: "Complex"

• Some languages support a complex type, e.g., C99, Fortran, and Python • Each value consists of two floats, the real part and the imaginary part • Literal form of a complex value in Python: (7 + 3j), where 7 is the real part and 3 is the imaginary part

Primitive Data Types: "Character"

• Stored as numeric codings • Most commonly used coding was: ASCII - Used 8 bits - Values 0-127 represented 128 different characters • 16-bit coding: Unicode (UCS-2) - Includes characters from most natural languages - Originally used in Java - C#, Python, Perl and JavaScript also support Unicode • 32-bit Unicode (UCS-4) - Supported by Fortran, starting with 2003 • Every implementation of the Java platform is required to support the following standard charsets:

Describing Syntactic Lists

• Syntactic lists are described using recursion <ident_list> → ident | ident, <ident_list> • A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (i.e., all terminal symbols)

BNF Fundamentals (continued)

• Terminal symbols are literal symbols and cannot be changed by the production rules of the grammar • Terminals are lexemes or tokens • Nonterminal sysmbols are the ones that can be replaced by other terminal and nonterminal symbols using production rules • Nonterminals are often enclosed in angle brackets, <>

Ambiguous to Unambiguous Grammar Conversion

• The correct ordering of operations is specified by using separate nonterminal symbols • Nonterminal represent the operands of the operators that have different precedence. • This requires additional nonterminals and some new rules. - For example, <expr> for "-", and <term> for "/" operations below. <expr> → <expr> - <term> | <term> <term> → <term> / const | const

Lexical Analysis (continued)

• The lexical analyzer is usually a function that is called by the parser when it needs the next token • One of the approaches to building a lexical analyzer: - Design a state diagram that describes the tokens and write a program that implements the state diagram

Top-down Parser

• Traces or builds a parse tree in preorder. - A preorder traversal of a parse tree begins with the root. - Each node is visited before its branches are followed. - Branches from a particular node are followed in left-to-right order. - This corresponds to a leftmost derivation

Compilation

• Translate high-level program (source language) into machine code (machine language) • Used in large commercial applications • Slow translation, fast execution

Computer Architecture Influence

• Well-known computer architecture: Von Neumann • Imperative languages, most dominant, because of von Neumann computers - Data and programs stored in memory - Memory is separate from CPU - Instructions and data are piped from memory to CPU - Basis for imperative languages

Associativity of Operators: left associative

• When two operators have the same precedence (e.g., * and /), they are evaluated left-to-right - E.g., A / B * C / D -- A /B will be evaluated first • A left recursive rule can be used to specify left associativity • In a left recursive rule, LHS also appears at the beginning of its RHS

The TIOBE Programming Community index

• an indicator of the popularity of programming languages. • The TIOBE index is not about the best programming language or the language in which most lines of code have been written. • The index is updated once a month. • The ratings are based on the number of skilled engineers world-wide, courses and third party vendors. • Popular search engines such as Google, Bing, Yahoo!, Wikipedia, Amazon, YouTube and Baidu are used to calculate the ratings. • The index can be used to check whether your programming skills are still up to date or to make a strategic decision about what programming language should be adopted when starting to build a new software system.

Primitive Data Types: "Integer"

•Almost always an exact reflection of the hardware so the mapping is trivial •There may be as many as eight different integer types in a language •Java's signed integer sizes: byte, short, int, long

Primitive Data Types: "Decimal"

•For business applications (money) - Essential to COBOL - C# offers a decimal data type •Store a fixed number of decimal digits, in coded form (BCD)* • Advantage : accuracy • Disadvantages : limited range, wastes memory

Primitive Data Types: "Boolean"

•Simplest of all •Range of values: two elements, one for "true" and one for "false" •Could be implemented as bits, but often as bytes -Advantage: readability

Character "String" Types

•Values are sequences of characters •Design issues: -Is it a primitive type or just a special kind of array? -Should the length of strings be static or dynamic?

State Diagram

(or Finite State Diagram) is a directed graph used to describe the behavior of a system


Set pelajaran terkait

Fundamentals Chapter 39- Oxygenation and Perfusion

View Set

History of Rock Music Final Study Guide

View Set

A.2.5 Network Pro Domain 5: Troubleshooting

View Set

Medical Coding Ch 4-5 Study Guide

View Set

Abeka Grade 9 World Geography Test 6

View Set

Part 3.1: Tactics: Developing an Entrepreneurial Plan

View Set