Chapter 6 (excluding 6.5.4 and 6.7), Chapter 7 (7.1, 7.2, 7.4), and Chapter 8 (8.5 only)

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Type Clash

A violation of the type compatibility rules is known as a type clash.

procedural abstraction

A potentially complex collection of control constructs (a subroutine) is encapsulated in a way that allows it to be treated as a single unit, usually subject to parameterization.

Prefix Notation

operator appears before its operands

Conditional expression

This sort of multiword infix notation occurs occasionally in other languages as well. In Algol one can say a := if b <> 0 then a/b else 0; Here "if... then ... else" is a three-operand infix operator. The equivalent operator in C is written "... ? ... : ...": a = b != 0 ? a/b : 0;

Reasons why initial values are useful:

1. As suggested in Figure 3.3, a static variable that is local to a subroutine needs an initial value in order to be useful. 2. For any statically allocated variable, an initial value that is specified in the declaration can be preallocated in global memory by the compiler, avoiding the cost of assigning an initial value at run time. 3. Accidental use of an uninitialized variable is one of the most common programming errors. One of the easiest ways to prevent such errors (or at least ensure that erroneous behavior is repeatable) is to give every variable a value when it is first declared. Most languages allow variables of built-in types to be initialized in their declarations. A more complete and orthogonal approach to initialization requires a notation for aggregates: built-up structured values of user-defined composite types. It should be emphasized that initialization saves time only for variables that are statically allocated.

A type system consists of:

1. a mechanism to define types and associate them with certain language constructs 2. a set of rules for type equivalence, type compatibility, and type inference The constructs that must have types are precisely those that have values, or that can refer to objects that have values. These constructs include named constants, variables, record fields, parameters, and sometimes subroutines; literal constants (e.g., 17, 3.14, "foo"); and more complicated expressions containing these.

Orthogonality

A common design goal is to make the various features of a language as orthogonal as possible. Orthogonality means that features can be used in any combination, the combinations all make sense, and the meaning of a given feature is consistent, regardless of the other features with which it is combined.

Numeric Types (ints and floats)

A few languages (e.g., C and Fortran) distinguish between different lengths of integers and real numbers; most do not, and leave the choice of precision to the implementation. Unfortunately, differences in precision across language implementations lead to a lack of portability: programs that run correctly on one system may produce run-time errors or erroneous results on another. Java and C# are unusual in providing several lengths of numeric types, with a specified precision for each. A few languages, including C, C++, C#, and Modula-2, provide both signed and unsigned integers (Modula-2 calls unsigned integers cardinals). Several languages support decimal types that use a base-10 encoding to avoid round-off anomalies in financial and human-centered arithmetic (see Sidebar 7.4).

Decimal type

A few languages, notably Cobol and PL/I, provide a decimal type for fixed-point representation of integer quantities. These types were designed primarily to exploit the binary-coded decimal (BCD) integer format supported by many traditional CISC machines.

Mixfix notation in Smalltalk

A few languages, notably ML and the R scripting language, allow the user to create new infix operators. Smalltalk uses infix notation for all functions (which it calls messages), both built-in and user-defined. The following Smalltalk statement sends a "displayOn: at: " message to graphical object myBox, with arguments myScreen and 100@50 (a pixel location). It corresponds to what other languages would call the invocation of the "displayOn: at:" function with arguments myBox, myScreen, and 100@50. myBox displayOn: myScreen at: 100@50

Iteration

A given fragment of code is to be executed repeatedly, either a certain number of times, or until a certain run-time condition is true. Iteration constructs include for/do, while, and repeat loops. In most languages, iteration takes the form of loops. Like the statements in a sequence, the iterations of a loop are generally executed for their side effects: their modifications of variables.

Hash table

A hash table is attractive if the set of label values is large, but has many missing values and no large ranges. With an appropriate hash function it will choose the right arm in O(1) time. Unfortunately, a hash table, like a jump table, requires a separate entry for each possible value of the controlling tested expression, making it unsuitable for statements with large value ranges.

Reference model

A language can make the distinction between l-values and r-values more explicit by employing a reference model of variables. Languages that do this include Algol 68, Clu, Lisp/Scheme, ML, and Smalltalk. In these languages, a variable is not a named container for a value; rather, it is a named reference to a value. In a language that uses the reference model, every variable is an l-value. _____________________________________________________________________________ LECTURE: In a reference model, a variable is a named reference to a value. The variable is referring to a value but it doesn't have a location in memory. In languages that use a reference model: 1. every variable is an l-value 2. When a variable appears in a context that expects a r-value, it must be dereferenced to obtain the value to which it refers. 3. In most languages that use reference models, this dereferencing is implicit and automatic. 4. In ML, the programmer must use an explicit dereference operator (!).

strict name equivalence

A language in which aliased types are considered distinct is said to have strict name equivalence.

loose name equivalence

A language in which aliased types are considered equivalent is said to have loose name equivalence.

Statically Typed Language

A language is said to be statically typed if it is strongly typed and type checking can be performed at compile time. In the strictest sense of the term, few languages are statically typed. In practice, the term is often applied to languages in which most type checking can be performed at compile time, and the rest can be performed at run time.

Post-test loop

A loop that executes the body, then tests for the exit condition. We don't have a special Python construct for this, but can use while and break together. The difference between these constructs is particularly important when the body of the loop is longer. Note that the body of a post-test loop is always executed at least once.

Exception handling and speculation

A program fragment is executed optimistically, on the assumption that some expected condition will be true. If that condition turns out to be false, execution branches to a handler that executes in place of the remainder of the protected fragment (in the case of exception handling) Or in place of the entire protected fragment (in the case of speculation). For speculation, the language implementation must be able to undo, or "roll back," any visible effects of the protected code.

Strongly Typed Language

A programming language that forces programmers to identify explicitly each variable's data type. A language is said to be strongly typed if it prohibits, in a way that the language implementation can enforce, the application of any operation to any object that is not intended to support that operation.

Safety vs Performance

A recurring theme in any comparison between C++ and Java is the latter's willingness to accept additional run-time cost in order to obtain cleaner semantics or increased reliability. Definite assignment is one example: it may force the programmer to perform "unnecessary" initializations on certain code paths, but in so doing it avoids the many subtle errors that can arise from missing initialization in other languages. Similarly, the Java specification mandates automatic garbage collection, and its reference model of user-defined types forces most objects to be allocated in the heap. As we shall see in future chapters, Java also requires both dynamic binding of all method invocations and run-time checks for out-of-bounds array references, type clashes, and other dynamic semantic errors. Clever compilers can reduce or eliminate the cost of these requirements in certain common cases, but for the most part the Java design reflects an evolutionary shift away from performance as the overriding design goal.

Denotational semantics

A set of values is known as a domain. Types are domains, and the meaning of an expression is a value from the domain that represents the expression's type. Some domains—the integers, for example—are simple and familiar. Others are more complex. An array can be thought of as a value from a domain whose elements are functions; each of these functions maps values from some finite index type (typically a subset of the integers) to values of some other element type. As it turns out, denotational semantics can associate a type with everything in a program—even statements with side effects. The meaning of an assignment statement is a value from a domain of higher-level functions, each of whose elements maps a store—a mapping from names to values that represents the current contents of memory—to another store, which represents the contents of memory after the assignment.

Dynamic Typing

A style of typing of variables where the type of objects to which variables are assigned can be changed merely by reassigning the variables. Python is dynamically typed. Thus, unlike as in a statically typed language such as C, a variable can first be assigned a string, then an integer, and later a list, just by making the appropriate assignment statements. This frees the programmer from managing many details, but does come at a performance cost. Determines types at runtime

Tail recursion

A tail-recursive function is one in which additional computation never follows a recursive call: the return value is simply whatever the recursive call returns. For such functions, dynamically allocated stack space is unnecessary: the compiler can reuse the space belonging to the current iteration when it makes the recursive call.

Whys is initialization cheaper than default initialization?

A typical example occurs in variable-length character strings. An assignment to such a string must generally deallocate the space consumed by the old value of the string before allocating space for the new value. An initialization of the string must simply allocate space. Initialization with a nontrivial value is generally cheaper than default initialization followed by assignment, because it avoids deallocation of the space allocated for the default value.

Expression-Oriented Language

Algol 68 is said to be expression-oriented: it has no separate notion of statement. Arbitrary expressions can appear in contexts that would call for a statement in many other languages, and constructs that are considered to be statements in other languages can appear within expressions.

Side-effect freedom

Among other things, side-effect freedom ensures that a Euclid or Turing function, like its counterpart in mathematics, is always idempotent: if called repeatedly with the same set of arguments, it will always return the same value, and the number of consecutive calls (after the first) will not affect the results of subsequent execution. In addition, side-effect freedom for functions means that the value of a subexpression will never depend on whether that subexpression is evaluated before or after calling a function in some other subexpression. These properties make it easier for a programmer or theorem-proving system to reason about program behavior. They also simplify code improvement, for example by permitting the safe rearrangement of expressions.

Enums as constants

An alternative to enumerations, of course, is simply to declare a collection of constants: const sun = 0; mon = 1; tue = 2; wed = 3; thu = 4; fri = 5; sat = 6; In C, the difference between the two approaches is purely syntactic: enum weekday {sun, mon, tue, wed, thu, fri, sat}; is essentially equivalent to: typedef int weekday; const weekday sun = 0, mon = 1, tue = 2, wed = 3, thu = 4, fri = 5, sat = 6; In Pascal and most of its descendants, however, the difference between an enumeration and a set of integer constants is much more significant: the enumeration is a full-fledged type, incompatible with integers. Using an integer or an enumeration value in a context expecting the other will result in a type clash error at compile time.

Loops come in two principal varieties

An enumeration-controlled loop is executed once for every value in a given finite set; the number of iterations is known before the first iteration begins. A logically controlled loop is executed until some Boolean condition (which must generally depend on values altered in the loop) changes value. Most (though not all) languages provide separate constructs for these two varieties of loop.

Recursion

An expression is defined in terms of (simpler versions of) itself, either directly or indirectly; the computational model requires a stack on which to save information about partially evaluated instances of the expression. Recursion is usually defined by means of self-referential subroutines.

Boxing and Unboxing

C# and more recent versions of Java perform automatic boxing and unboxing operations that avoid the wrapper syntax in many cases: ht.put(13, 31); int m = (Integer) ht.get(13); Here the Java compiler creates hidden Integer objects to hold the values 13 and 31, so they may be passed to put as references.

Imperative language assignments

Computation typically consists of an ordered series of changes to the values of variables in memory. Assignments provide the principal means by which to make the changes. Each assignment takes a pair of arguments: a value and a reference to a variable into which the value should be placed.

Continuations in Scheme

Continuation support in Scheme takes the form of a function named call-with-current-continuation, often abbreviated call/cc. This function takes a single argument f, which is itself a function of one argument. Call/cc calls f, passing as argument a continuation c that captures the current program counter, referencing environment, and stack backtrace. The continuation is implemented as a closure, indistinguishable from the closures used to represent subroutines passed as parameters. At any point in the future, f can call c, passing it a value, v. The call will "return" v into c's captured context, as if it had been returned by the original call to call/cc.

Three popular ways to formalize the notion of "type"

Denotational, structural, and abstraction-based

Selection

Depending on some run-time condition, a choice is to be made among two or more statements or expressions. The most common selection constructs are if and case (switch) statements. Selection is also sometimes referred to as alternation. Put another way, the purpose of the Boolean expression in a selection statement is not to compute a value to be stored, but to cause control to branch to various locations. This observation allows us to generate particularly efficient code (called jump code) for expressions that are amenable to the short-circuit evaluation. _____________________________________________________________________________ LECTURE: Uses some variant of if...then...else notation What is the main purpose of the Boolean expression in a selection? - Is it to compute a value to be stored? - Is it to cause control to branch to various locations? Most machine provide conditional branch instructions to capture simple comparisons Taking advantage of these branch instructions, compilers can generate efficient machine code (jump code) for expressions if short-circuit Boolean evaluation is allow in a language.

Functional Language expressions

In a purely functional language, expressions are the building blocks of programs, and computation consists entirely of expression evaluation. The effect of any individual expression on the overall computation is limited to the value that expression provides to its surrounding context. Complex computations employ recursion to generate a potentially unbounded number of values, expressions, and contexts. purely functional languages have no side effects. As a result, the value of an expression in such a language depends only on the referencing environment in which the expression is evaluated, not on the time at which the evaluation occurs.

Logically Controlled Loops

In comparison to enumeration-controlled loops, logically controlled loops have many fewer semantic subtleties. The only real question to be answered is where within the body of the loop the terminating condition is tested. By far the most common approach is to test the condition before each iteration. The familiar while loop syntax for this was introduced in Algol-W: while condition do statement To allow the body of the loop to be a statement list, most modern languages use an explicit concluding keyword (e.g., end), or bracket the body with delimiters (e.g., { ...}). A few languages (notably Python) indicate the body with an extra level of indentation.

Passing the "loop body" to an iterator in Scheme

In functional languages, the ability to specify a function "in line" facilitates a programming idiom in which the body of a loop is written as a function, with the loop index as an argument. This function is then passed as the final argument to an iterator, which is itself a function. In Scheme we might write (define uptoby (lambda (low high step f) (if (<= low high) (begin (f low) (uptoby (+ low step) high step f)) '()))) We could then sum the first 50 odd numbers as follows: (let ((sum 0)) (uptoby 1 100 2 (lambda (i) (set! sum (+ sum i)))) sum) ⇒ 2500 Here the body of the loop, (set! sum (+ sum i)), is an assignment. The ⇒ symbol (not a part of Scheme) is used here to mean "evaluates to."

Distinguished Values for enums in Java

In recent versions of Java one can obtain a similar effect by giving values an extra field (here named register): enum arm_special_regs { fp(7), sp(13), lr(14), pc(15); private final int register; arm_special_regs(int r) { register = r; } public int reg() { return register; } } ... int n = arm_special_regs.fp.reg();

unwinding

In the event of a nonlocal goto, the language implementation must guarantee to repair the run-time stack of subroutine call information. This repair operation is known as unwinding. It requires not only that the implementation deallocate the stack frames of any subroutines from which we have escaped, but also that it perform any bookkeeping operations, such as restoration of register contents, that would have been performed when returning from those routines.

Dynamic Check

Instead of giving every uninitialized variable a default value, a language or implementation can choose to define the use of an uninitialized variable as a dynamic semantic error, and can catch these errors at run time. The advantage of the semantic checks is that they will often identify a program bug that is masked or made more subtle by the presence of a default value. With appropriate hardware support, uninitialized variable checks can even be as cheap as default values, at least for certain types. In particular, a compiler that relies on the IEEE standard for floating-point arithmetic can fill uninitialized floating-point numbers with a signaling NaN value, as discussed in Section C-5.2.2. Any attempt to use such a value in a computation will result in a hardware interrupt, which the language implementation may catch (with a little help from the operating system), and use to trigger a semantic error message.

Discrete Types (ordinal types)

Integers, Booleans, and characters are all examples of discrete types (also called ordinal types): the domains to which they correspond are countable (they have a one-to-one correspondence with some subset of the integers), and have a well-defined notion of predecessor and successor for each element other than the first and the last. (In most implementations the number of possible integers is finite, but this is usually not reflected in the type system.) Two varieties of user-defined types, enumerations and subranges, are also discrete. Discrete, rational, real, and complex types together constitute the scalar types. Scalar types are also sometimes called simple types.

Most common built-in types

Integers, characters, Booleans, real (floating-point) numbers

Polymorphism

It applies to code—both data structures and subroutines—that is designed to work with values of multiple types. To maintain correctness, the types must generally have certain characteristics in common, and the code must not depend on any other characteristics.

Two common initialization errors

It is also worth noting that the problem of using an uninitialized variable occurs not only after elaboration, but also as a result of any operation that destroys a variable's value without providing a new one. Two of the most common such operations are explicit deallocation of an object referenced through a pointer and modification of the tag of a variant record.

Operator

It is conventional to use the term operator for built-in functions that use special, simple syntax

Iteration and Recursion

Iteration is in some sense the more "natural" of the two in imperative languages, because it is based on the repeated modification of variables. Recursion is the more natural of the two in functional languages, because it does not change variables. In the final analysis, which to use in which circumstance is mainly a matter of taste.

Definite Assignment

Java and C# define a notion of definite assignment that precludes the use of uninitialized variables. Roughly speaking, every possible control path to an expression must assign a value to every variable in that expression. This is a conservative rule; it can sometimes prohibit programs that would never actually use an uninitialized variable.

Languages uses for value and reference models

Java uses a value model for built-in types and a reference model for user-defined types (classes). C# and Eiffel allow the programmer to choose between the value and reference models for each individual user-defined type. A C# class is a reference type; a struct is a value type.

jump code

Jump code is applicable not only to selection statements such as if... then ... else, but to logically controlled loops as well; In the usual process of code generation, a synthesized attribute of the root of an expression subtree acquires the name of a register into which the value of the expression will be computed at run time. The surrounding context then uses this register name when generating code that uses the expression. In jump code, inherited attributes of the root inform it of the addresses to which control should branch if the expression is true or false, respectively.

Juxtaposition in ML

ML-family languages dispense with the parentheses altogether, except when they are required for disambiguation: max (2 + 3) 4;; ⇒ 5

Expressions vs Statements

Many imperative languages distinguish between: expressions, which always produce a value, and may or may not have side effects, and statements, which are executed solely for their side effects, and return no useful value. Given the centrality of assignment, imperative programming is sometimes described as "computing by means of side effects."

Arithmetic Overflow in Languages

Many languages, including Pascal and most of its descendants, provide dynamic semantic checks to detect arithmetic overflow. In some implementations these checks can be disabled to eliminate their run-time overhead. In C and C++, the effect of arithmetic overflow is implementation-dependent. In Java, it is well defined: the language definition specifies the size of all numeric types, and requires two's complement integer and IEEE floating-point arithmetic. In C#, the programmer can explicitly request the presence or absence of checks by tagging an expression or statement with the checked or unchecked keyword. In a completely different vein, Scheme, Common Lisp, and several scripting languages place no a priori limit on the size of integers; space is allocated to hold extra-large values on demand.

Cambridge Polish prefix notation

Most imperative languages use infix notation for binary operators and prefix notation for unary operators and (with parentheses around the arguments) other functions. Lisp uses prefix notation for all functions, but with the third of the variants above: in what is known as Cambridge Polish notation, it places the function name inside the parentheses:

Space requirements of subrange type

Most implementations employ the same bit patterns for integers and subranges, so subranges whose values are large require large storage locations, even if the number of distinct values is small. The following type, for example: type water_temperature = 273..373; (* degrees Kelvin *) would be stored in at least two bytes. While there are only 101 distinct values in the type, the largest (373) is too large to fit in a single byte in its natural encoding. (An unsigned byte can hold values in the range 0.. 255; a signed byte can hold values in the range −128.. 127.)

name equivalence

Name equivalence is based on the lexical occurrence of type definitions: roughly speaking, each definition introduces a new type. Name equivalence appears in Java, C#, standard Pascal, and most Pascal descendants, including Ada.

Composite Types

Nonscalar types are usually called composite types. They are generally created by applying a type constructor to one or more simpler types. Options, which we introduced in Example 7.6, are arguably the simplest composite types, serving only to add an extra "none of the above" to the values of some arbitrary base type. Other common composite types include records (structures), variant records (unions), arrays, sets, pointers, lists, and files. All but pointers and lists are easily described in terms of mathematical set operations (pointers and lists can be described mathematically as well, but the description is less intuitive).

Postfix Notation

Operator appears after its operands

Infix Notation

Operator appears among its operands

Alternative Implementations (Homework 2 question 3?)

Page 258 Alternative methods to compute the address to which to branch include sequential testing, hashing, and binary search.

Postfix notation (homework 1)

Postfix notation is used for most functions in Postscript, Forth, the input language of certain hand-held calculators, and the intermediate code of some compilers. Postfix appears in a few places in other languages as well. Examples include the pointer dereferencing operator(S) of Pascal and the post-increment and decrement operators (++ and −−) of C and its descendants.

Precedence

Precedence rules specify that certain operators, in the absence of parentheses, group "more tightly" than other operators. In most languages multiplication and division group more tightly than addition and subtraction, so 2 + 3 × 4 is 14 and not 20.

Distinguished Values for enums

Several languages allow the programmer to specify the ordinal values of enumeration types, if the default assignment is undesirable. In C, C++, and C#, one could write enum arm_special_regs {fp =7, sp = 13, lr = 14, pc = 15};

Short-circuit evaluation design and implementation

Short-circuit evaluation is one of those happy cases in programming language design where a clever language feature yields both more useful semantics and a faster implementation than existing alternatives. Other at least arguable examples include case statements, local scopes for for loop indices (Section 6.5.1), and Ada-style parameter modes (Section 9.3.1).

Why do most languages leave unspecified the order in which the arguments of an operator or function are evaluated?

Side effect, code improvement (240, Examples 6.29 and 6.30) Because of the importance of code improvement, most language manuals say that the order of evaluation of operands and arguments is undefined. (Java and C# are unusual in this regard: they require left-to-right evaluation.) In the absence of an enforced order, the compiler can choose whatever order is likely to result in faster code.

Exception-handling mechanism

The auxiliary Booleans can be eliminated by using a nonlocal goto or multilevel return, but the caller to which we return must still inspect status codes explicitly. As a structured alternative, many modern languages provide an exception-handling mechanism for convenient, nonlocal recovery from exceptions. Typically the programmer appends a block of code called a handler to any computation in which an exception may arise. The job of the handler is to take whatever remedial action is required to recover from the exception. If the protected computation completes in the normal fashion, execution of the handler is skipped.

Cleaning up continuations

The implementation of continuations in Scheme and Ruby is surprisingly straightforward. Because local variables have unlimited extent in both languages, activation records must in general be allocated on the heap. As a result, explicit deallocation of frames in the current context is neither required nor appropriate when jumping through a continuation: if those frames are no longer accessible, they will eventually be reclaimed by the standard garbage collector

Nondeterminacy

The ordering or choice among statements or expressions is deliberately left unspecified, implying that any alternative will lead to correct results. Some languages require the choice to be random, or fair, in some formal sense of the word.

immutable vs mutable values

The practical effect is the same in this example, because integers are immutable: the value of 2 never changes, so we can't tell the difference between two copies of the number 2 and two references to "the" number 2. Mutable: Can change value

dangling else problem

The problem of determining with which if statement an else statement will be paired.

Why do most languages not allow the bounds or increment of an enumeration-controlled loop to be floating-point numbers?

The problem with real-number sequences is that limited precision can cause comparisons (e.g., between the index and the bound) to produce unexpected or even implementation-dependent results when the values are close to one another. Should for x := 1.0 to 2.0 by 1.0 / 3.0 execute three iterations or four? It depends on whether 1.0 / 3.0 is rounded up or down. (Page 264)

Multiple sizes of integers

The space savings possible with (small-valued) subrange types in Pascal and Ada is achieved in several other languages by providing more than one size of built-in integer type. C and C++, for example, support integer arithmetic on signed and unsigned variants of char, short, int, long, and long long types, with monotonically nondecreasing sizes.

prefix, infix, postfix

These terms indicate, respectively, whether the function name appears before, among, or after its several arguments A language may employ prefix, infix, or postfix notation to specify the location of operators/functions (relative to their operands/arguments in an expression)

Concurrency

Two or more program fragments are to be executed/evaluated "at the same time," either in parallel on separate processors, or interleaved on a single processor in a way that achieves the same effect.

Why is the distinction between mutable and immutable values important in the implementation of a language with a reference model of variables?

Under the reference model, it becomes important to distinguish between variables that refer to the same object and variables that refer to different objects whose values happen (at the moment) to be equal.

Ways to resolve dangling-else

Use a disambiguating semantic rule or Rewrite the grammar to remove ambiguity C uses a disambiguating rule to resolve the ambiguity referred to as the most closely nested rule - the else is associate with the closest prior if statement that does not already have an else part. Some languages like Ada and Ruby solve the problem by introducing a bracketing keyword into the grammar.

Advantages of subrange types

Using an explicit subrange has several advantages. For one thing, it helps to document the program. Because the compiler analyzes a subrange declaration, it knows the expected range of subrange values, and can generate code to perform dynamic semantic checks to ensure that no subrange variable is ever assigned an invalid value. In addition, since the compiler knows the number of values in the subrange, it can sometimes use fewer bits to represent subrange values than it would need to use to represent arbitrary integers. In the example above, test_score values can be stored in a single byte.

Converting to and from enum type

Values of an enumeration type are typically represented by small integers, usually a consecutive range of small integers starting at zero. In many languages these ordinal values are semantically significant, because built-in functions can be used to convert an enumeration value to its ordinal value, and sometimes vice versa. In Ada, these conversions employ the attributes pos and val: weekday' pos(mon) = 1 and weekday'val(1) = mon.

When are variables in the stack or heap allocated?

Variables allocated in the stack or heap at run time must be initialized at run time.

Value vs Reference models

With a value model of variables, any integer variable can contain the value 2. With a reference model of variables, there is (at least conceptually) only one 2—a sort of Platonic Ideal— to which any variable can refer.

Wrapper Class

a class designed to contain a primitive data type so that the primitive type can behave like a reference type

anonymous type

a data type in which you directly specify values in the variable declaration with no type name

Operand

an argument of an operator

Side effect

an observable change to program state (memory) or any interaction with the external world (input and/or output). An expression or function is said to have side effects if , in addition to returning a value, it also modifies some state (modifies global/static variable or arguments) or causes input or output to take place. Many languages allow expressions to have side effects. Example: i++ in C A program with no side effects is useless

Files

are intended to represent data on mass-storage devices, outside the memory in which other program objects reside. Like arrays, most files can be conceptualized as a function that maps members of an index type (generally integer) to members of a component type. Unlike arrays, files usually have a notion of current position, which allows the index to be implied implicitly in consecutive operations. Files often display idiosyncrasies inherited from physical input/output devices. In particular, the elements of some files must be accessed in sequential order.

Pointers

are l-values. A pointer value is a reference to an object of the pointer's base type. Pointers are often but not always implemented as addresses. They are most often used to implement recursive data types. A type T is recursive if an object of type T may contain one or more references to other objects of type T.

Arrays

are the most commonly used composite types. An array can be thought of as a function that maps members of an index type to members of a component type. Arrays of characters are often referred to as strings, and are often supported by special-purpose operations not available for other arrays.

Booleans (logicals)

are typically implemented as single-byte quantities, with 1 representing true and 0 representing false. In a few languages and implementations, Booleans maybe packed into arrays using only one bit per value. As noted in Section 6.1.2 ("Orthogonality"), C was historically unusual in omitting a Boolean type: where most languages would expect a Boolean value, C expected an integer, using zero for false and anything else for true. C99 introduced a new _Bool type, but it is effectively an integer that the compiler is permitted to store in a single bit. As noted in Section C-6.5.4, Icon replaces Booleans with a more general notion of success and failure.

Continuations

consists of a code address, a referencing environment that should be established (or restored) when jumping to that address, and a reference to another continuation that represents what to do in the event of a subsequent subroutine return. (The chain of return continuations constitutes a backtrace of the run-time stack.) In higher-level terms, a continuation is an abstraction that captures a context in which execution might continue. Continuations are fundamental to denotational semantics. They also appear as first-class values in several programming languages (notably Scheme and Ruby), allowing the programmer to define new control-flow constructs.

Normal Order Evaluation

each operation begins its evaluation before its operands are evaluated, and each operand is evaluated only if it is needed for the calculation of the operation Normal-order evaluation is what naturally occurs in macros (Section 3.7). It also occurs in short-circuit Boolean evaluation (Section 6.1.5), call-by-name parameters (to be discussed in Section 9.3.1), and certain functional languages (to be discussed in Section 11.5). Normal-order evaluation is one of many examples we have seen where arguably desirable semantics have been dismissed by language designers because of fear of implementation cost. Other examples in this chapter include side-effect freedom (which allows normal order to be implemented via lazy evaluation), iterators (Section 6.5.3), and nondeterminacy (Section 6.7). As noted in Sidebar 6.2, however, there has been a tendency over time to trade a bit of speed for cleaner semantics and increased reliability. Within the functional programming community, Haskell and its predecessor Miranda are entirely side-effect free, and use normal-order (lazy) evaluation for all parameters. A delayed expression is sometimes called a promise. The mechanism used to keep track of which promises have already been evaluated is sometimes called memoization. Because applicative-order evaluation is the default in Scheme, the programmer must use special syntax not only to pass an unevaluated argument, but also to use it. In Algol 60, subroutine headers indicate which arguments are to be passed which way; the point of call and the uses of parameters within subroutines look the same in either case ___________________________________________________________________________ FROM LECTURE: 1. Pass unevaluated arguments to the subroutine and evaluate them only when (if) the value is actually needed. 2. Also known as non-strict, lazy, or delayed evaluation 3. Normal order evaluation occurs in macro processing, short-circuit Boolean evaluation, call-by-name parameters, and certain functional languages. 4. Haskell and Miranda languages use normal order evaluation for all parameters.

Applicative order evaluation

evaluation before the subroutine call - clearer and more efficient 1. Evaluate all arguments before the subroutine is called 2. Also known as strict evaluation 3. Most languages use applicative order evaluation (Arguments are the data that you're sending to a function, parameters are when you receive that data inside the function)

Subroutines need to have types if they are:

first- or second-class values (i.e., if they can be passed as parameters, returned by functions, or stored in variables). In each of these cases there is a construct in the language whose value is a dynamically determined subroutine; type information allows the language to limit the set of acceptable values to those that provide a particular subroutine interface (i.e., particular numbers and types of parameters). In a statically scoped language that never creates references to subroutines dynamically (one in which subroutines are always third-class values), the compiler can always identify the subroutine to which a name refers, and can ensure that the routine is called correctly without necessarily employing a formal notion of subroutine types.

Records (structs)

introduced by Cobol, and have been supported by most languages since the 1960s. A record consists of collection of fields, each of which belongs to a (potentially different) simpler type. Records are akin to mathematical tuples; a record type corresponds to the Cartesian product of the types of the fields.

Type checking

is the process of ensuring that a program obeys the language's type compatibility rules.

Lists

like arrays, contain a sequence of elements, but there is no notion of mapping or indexing. Rather, a list is defined recursively as either an empty list or a pair consisting of a head element and a reference to a sublist. While the length of an array must be specified at elaboration time in most (though not all) languages, lists are always of variable length. To find a given element of a list, a program must examine all previous elements, recursively or iteratively, starting at the head. Because of their recursive definition, lists are fundamental to programming in most functional languages.

Sets

like enumerations and subranges, were introduced by Pascal. A set type is the mathematical powerset of its base type, which must often be discrete. A variable of a set type contains a collection of distinct elements of the base type.

Lazy Evaluation

n both cases, what matters is that normal-order evaluation will sometimes not evaluate an argument at all, if its value is never actually needed. Scheme provides for optional normal-order evaluation in the form of built-in functions called delay and force.10 These functions provide an implementation of lazy evaluation. In the absence of side effects, lazy evaluation has the same semantics as normal-order evaluation, but the implementation keeps track of which expressions have already been evaluated, so it can reuse their values if they are needed more than once in a given referencing environment.

Expressions

the building blocks on which all higher-level ordering is based We consider the syntactic form of expressions, the precedence and associativity of operators, the order of evaluation of operands, and the semantics of the assignment statement

subtype polymorphism

the code is designed to work with values of some specific type T, but the programmer can define additional types to be extensions or refinements of T, and the code will work with these subtypes as well. Subtype polymorphism appears primarily in object-oriented languages. With static typing, most of the work required to deal with multiple types can be performed at compile time: the principal run-time cost is an extra level of indirection on method invocations. Most languages that envision such an implementation, including C++, Eiffel, OCaml, Java, and C#, provide a separate mechanism for generics, also checked mainly at compile time. The combination of subtype and parametric polymorphism is particularly useful for container (collection) classes such as "list of T" (List<T>) or "stack of T" (Stack<T>), where T is initially unspecified, and can be instantiated later as almost any type.

parametric polymorphism

the code takes a type (or set of types) as a parameter, either explicitly or implicitly. Explicit parametric polymorphism, also known as generics (or templates in C++), typically appears in statically typed languages, and is usually implemented at compile time. The implicit version can also be implemented at compile time—specifically, in ML-family languages; more commonly, it is paired with dynamic typing, and the checking occurs at run time.

Type Equivalence

the method used to determine whether two types are equivalent, e.g. name equivalence or structural equivalence.

Enumeration Types

All possible values, which are named constants, are provided in the definition. An enumeration type consists of a set of named elements. In Pascal, one could write: type weekday = (sun, mon, tue, wed, thu, fri, sat); The values of an enumeration type are ordered, so comparisons are generally valid (mon < tue), and there is usually a mechanism to determine the predecessor or successor of an enumeration value (in Pascal, tomorrow := succ(today)). The ordered nature of enumerations facilitates the writing of enumeration-controlled loops: for today := mon to fri do begin ...

Assignment

Assignment is perhaps the most fundamental side effect: while the evaluation of an assignment may sometimes yield a value, what we really care about is the fact that it changes the value of a variable, thereby influencing the result of any later computation in which the variable appears.

Associativity

Associativity rules specify whether sequences of operators of equal precedence group to the right or to the left. Conventions here are somewhat more uniform across languages, but still display some variety. The basic arithmetic operators almost always associate left-to-right, so 9 − 3 − 2 is 4 and not 8.

Referentially transparent

At the opposite extreme, purely functional languages have no side effects. As a result, the value of an expression in such a language depends only on the referencing environment in which the expression is evaluated, not on the time at which the evaluation occurs. If an expression yields a certain value at one point in time, it is guaranteed to yield the same value at any point in time. In fancier terms, expressions in a purely functional language are said to be referentially transparent.

Binary Coded Decimal (BCD)

BCD devotes one nibble (four bits—half a byte) to each decimal digit. Machines that support BCD in hardware can perform arithmetic directly on the BCD representation of a number, without converting it to and from binary form. This capability is particularly useful in business and financial applications, which treat their data as both numbers and character strings.

L-Values

Because of their use on the left-hand side of assignment statements, expressions that denote locations are referred to as l-values. In C++ it is even possible for a function to return a reference to a structure, rather than a pointer to it, allowing one to write: g(a).b[c] = 2; Expressions that refer to memory locations are called l-values. Assignment statements require a l-value on its left-hand side. All l-values are r-values but not all r-values are l-values

Binary search

Binary search can accommodate ranges easily. It chooses an arm in O(log n) time.

Short-circuit evaluation

Boolean expressions provide a special and important opportunity for code improvement and increased readability. Consider the expression (a < b) and (b < c). If a is greater than b, there is really no point in checking to see whether b is less than c; we know the overall expression must be false. Similarly, in the expression (a > b) or (b > c), if a is indeed greater than b there is no point in checking to see whether b is greater than c; we know the overall expression must be true. A compiler that performs short-circuit evaluation of Boolean expressions will generate code that skips the second half of both of these computations when the overall value can be determined from the first half. Short-circuit evaluation can save significant amounts of time in certain situations: if (unlikely_condition && expensive_function())

Post-test loop in C

C provides a post-test loop whose condition works "the other direction" (i.e., "while" instead of "until"): do { line = read_line(stdin); } while (line[0] != '$');

Characters (type)

Characters have traditionally been implemented as one-byte quantities as well, typically (but not always) using the ASCII encoding. More recent languages (e.g., Java and C#) use a two-byte representation designed to accommodate (the commonly used portion of) the Unicode character set. Unicode is an international standard designed to capture the characters of a wide variety of languages (see Sidebar 7.3). The first 128 characters of Unicode (\u0000 through \u007f) are identical to ASCII. C and C++ provide both regular and "wide" characters, though for wide characters both the encoding and the actual width are implementation dependent. Fortran 2003 supports four-byte Unicode characters.

Type compatibility rules

Determine when a value of a given type can be used in a given context

Errors and Other Exceptions

Eiffel formalizes this notion by saying that every software component has a contract—a specification of the function it performs. A component that is unable to fulfill its contract is said to fail. Rather than return in the normal way, it must arrange for control to "back out" to some context in which the program is able to recover. Conditions that require a program to "back out" are usually called exceptions.

Continuation-Passing Style

Even for functions that are not tail-recursive, automatic, often simple transformations can produce tail-recursive code. The general case of the transformation employs conversion to what is known as continuation-passing style. In effect, a recursive function can always avoid doing any work after returning from a recursive call by passing that work into the recursive call, in the form of a continuation.

R-Values

Expressions that denote values (possibly the value stored in a location) are referred to as r-values Sometimes used to describe the value of an expression and to distinguish it from an l-value

General iterators

Following the lead of Clu, many modern languages allow enumeration-controlled loops to iterate over much more general finite sets—the nodes of a tree, for example, or the elements of a collection. We consider these more general iterators in Section 6.5.3.

Sequential testing

Sequential testing (as in an if... then ... else statement) is the method of choice if the total number of case statement labels is small. It chooses an arm in O(n) time, where n is the number of labels.

Assignment Operators

To eliminate the clutter and compile- or run-time cost of redundant address calculations, and to avoid the issue of repeated side effects, many languages, beginning with Algol 68, and including C and its descendants, provide so-called assignment operators to update a variable. Instead of: a = a + 1; we can write a += 1; In addition to being aesthetically cleaner, the assignment operator form guarantees that the address calculation and any side effects happen only once.

Type Equivalence rules

determine when the types of two values are the same.

Mid-test loop

Finally, as we noted in Section 6.2.1, it is sometimes appropriate to test the terminating condition in the middle of a loop. In many languages this "mid-test" can be accomplished with a special statement nested inside a conditional: exit in Ada, break in C, last in Perl. for (;;) { line = read_line(stdin); if (all_blanks(line)) break; consume_line(line); } Here the missing condition in the for loop header is assumed to always be true. (C programmers have traditionally preferred this syntax to the equivalent while (1), presumably because it was faster in certain early C compilers.)

Abstraction-based type

From the abstraction-based point of view, a type is an interface consisting of a set of operations with well-defined and mutually consistent semantics. For both programmers and language designers, types may also reflect a mixture of these viewpoints.

Denotational Type

From the denotational point of view, a type is simply a set of values. A value has a given type if it belongs to the set; an object has a given type if its value is guaranteed to be in the set

Structural Type

From the structural point of view, a type is either one of a small collection of built-in types (integer, character, Boolean, real, etc.; also called primitive or predefined types), or a composite type created by applying a type constructor (record, array, set, etc.) to one or more simpler types. (This use of the term "constructor" is unrelated to the initialization functions of object-oriented languages. It also differs in a more subtle way from the use of the term in ML.)

L-value and R-value in reference models

In a language that uses the reference model, every variable is an l-value. When it appears in a context that expects an r-value, it must be dereferenced to obtain the value to which it refers.

Jump Tables

The "code" at label T in that figure is in fact an array of addresses, known as a jump table. It contains one entry for each integer between the lowest and highest values, inclusive, found among the case statement labels. The code at L6 checks to make sure that the controlling expression is within the bounds of the array (if not, we should execute the others arm of the case statement). It then fetches the corresponding entry from the table and branches to it. A jump table is fast: it begins executing the correct arm of the case statement in constant time, regardless of the value of the controlling expression. It is also space efficient when the overall set of case statement labels is dense and does not contain large ranges. It can consume an extraordinarily large amount of space, however, if the set of labels is nondense, or includes large value ranges.

Universal Character Set (UCS)

The ISO 10646 international standard defines a Universal Character Set (UCS) Intended to include all characters of all known human languages. Unicode is an expanded version of ISO 10646, maintained by an international consortium of software manufacturers. In addition to mapping tables, it covers such topics as rendering algorithms, directionality of text, and sorting and comparison conventions.

Zero-initilization

If a variable is not given an initial value explicitly in its declaration, the language may specify a default value. In C, for example, statically allocated variables for which the programmer does not provide an initial value are guaranteed to be represented in memory as if they had been initialized to zero. For most types on most machines, this is a string of zero bits, allowing the language implementation to exploit the fact that most operating systems (for security reasons) fill newly allocated memory with zeros. Zero-initialization applies recursively to the subcomponents of variables of user-defined composite types. Java and C# provide a similar guarantee for the fields of all class-typed objects, not just those that are statically allocated. Most scripting languages provide a default initial value for all variables, of all types, regardless of scope or lifetime.

Subranges in Ada

In Ada one would write: type test_score is new integer range 0..100; subtype workday is weekday range mon..fri; The range... portion of the definition in Ada is called a type constraint. In this example test_score is a derived type, incompatible with integers. The workday type, on the other hand, is a constrained subtype; workdays and weekdays can be more or less freely intermixed.

Sequencing

Statements are to be executed (or expressions evaluated) in a certain specified order—usually the order in which they appear in the program text. Like assignment, sequencing is central to imperative programming. In most imperative languages, lists of statements can be enclosed with begin... end or { ... } delimiters and then used in any context in which a single statement is expected. Such a delimited list is usually called a compound statement. A compound statement optionally preceded by a set of declarations is sometimes called a block. __________________________________________________________________________ LECTURE: Sequencing refers to the ability to execute statements (or expressions) in a certain specified order - usually the order in which they appear in the program text Sequencing is central to imperative programming. - Computation is modeled with each statement in a sequence causing some side effects. Compound statement: a list of statements enclose with some delimiter (such as begin..end or {}) Block: a compound statement optionally preceded by a set of declarations. Sequencing is a useless operation unless the statements (possibly with the exception of the last statement depending on the language) in the sequence cause side effects.

structural equivalence

Structural equivalence is based on the content of type definitions: roughly speaking, two types are the same if they consist of the same components, put together in the same way. Structural equivalence is used in Algol-68, Modula-3, and (with various wrinkles) C and ML.

Explicit type conversion (type cast)

Suppose for the moment that we require in each of these cases that the types (expected and provided) be exactly the same. Then if the programmer wishes to use a value of one type in a context that expects another, he or she will need to specify an explicit type conversion (also sometimes called a type cast). Depending on the types involved, the conversion may or may not require code to be executed at run time. There are three principal cases: 1. The types would be considered structurally equivalent, but the language uses name equivalence. In this case the types employ the same low-level representation, and have the same set of values. The conversion is therefore a purely conceptual operation; no code will need to be executed at run time. 2. The types have different sets of values, but the intersecting values are represented in the same way. One type may be a subrange of the other, for example, or one may consist of two's complement signed integers, while the other is unsigned. If the provided type has some values that the expected type does not, then code must be executed at run time to ensure that the current value is among those that are valid in the expected type. If the check fails, then a dynamic semantic error results. If the check succeeds, then the underlying representation of the value can be used, unchanged. Some language implementations may allow the check to be disabled, resulting in faster but potentially unsafe code. 3. The types have different low-level representations, but we can nonetheless define some sort of correspondence among their values. A 32-bit integer, for example, can be converted to a double-precision IEEE floating-point number with no loss of precision. Most processors provide a machine instruction to effect this conversion. A floating-point number can be converted to an integer by rounding or truncating, but fractional digits will be lost, and the conversion will overflow for many exponent values. Again, most processors provide a machine instruction to effect this conversion. Conversions between different lengths of integers can be effected by discarding or sign-extending high-order bytes.

Value model

d = a; (a is a value) a = b + c; (a is a location) Both interpretations—value and location—are possible because a variable in C (as in many other languages) is a named container for a value. Under a value model of variables, a given expression can be either an l-value or an r-value, depending on the context in which it appears. _____________________________________________________________________________ LECTURE: In a value model, a variable is a name container (variable) for a value.

Type Inference Rules

define the type of an expression based on the types of its constituent parts or (sometimes) the surrounding context. In a language with polymorphic variables or parameters, it may be important to distinguish between the type of a reference or pointer and the type of the object to which it refers: a given name may refer to objects of different types at different times.

Variant records (unions)

differ from "normal" records in that only one of a variant record's fields (or collections of fields) is valid at any given time. A variant record type is the disjoint union of its field types, rather than their Cartesian product.

index of the loop

do i = 1, 10, 2 ... enddo Variable i is called the index of the loop. The expressions that follow the equals sign are i's initial value, its bound, and the step size.


Ensembles d'études connexes

Fluid and Electrolyte Imbalance In Class Assignment

View Set

Real Estate Exam National and State

View Set

Prep U + Definitions Foundations of Nursing (Chapter 15, 16, 18, 25, 32, 34) Test 1

View Set

Ch. 7 Cost Allocation: Departments, Joint Products, and By-Products

View Set