CS 3110 Final
How to define natural numbers using a stream
# let rec from n = Cons (n, fun () -> from (n+1));; val from : int -> int stream = <fun> # let nats = from 0;; val nats : int stream = Cons (0, <fun>)
Map
Binds keys to values. Also called associative array, dictionary, and symbol table.
Parser generators
Build on the theory of pushdown automata (like DFAs but they also maintain a stack onto which they can push and pop symbols). Stack enables them to accept a bigger class of languages, which are known as context-free languages (CFLs).
Lexer generators
Built on the theory of DFAs. The input is a collection of regular expressions that describe the tokens of the language. The output is an automaton implemented in a high-level language. Automaton rejects or accepts characters from a file as valid tokens in the language.
Context-free languages (CFLs)
CFLs can express the idea that delimiters must be balanced—for example, that every opening parenthesis must be balanced by a closing parenthesis. Described by context-free grammars.
Concurrency
Concurrent programs enable computations to overlap in duration, instead of being forced to happen sequentially.
Efficiency of insert, find, and remove in an association list
- insert is just a cons onto the front of the list, which is constant time—that is, O(1). - find potentially requires examining all elements of the list, which is linear time—that is, O(n), where n is the number of bindings in the map. - remove is the same complexity as find, O(n).
Handling Collisions
1) Chaining/closed addressing/open hashing: Store multiple bindings at each array index. Array elements are called buckets. 2) Probing/open addressing/closed hashing: When adding a new binding to the hash table would create a collision, the insert operation instead finds an empty location in the array to put the binding.
Thunk
A function that is used just to delay computation, and in particular one that takes unit as input. Used for streams.
Compiler
A program that implements a programming language. A compiler's primary task is translation. It takes as input a source program, typically expressed in a high-level language like Java, and produces as output a target program, typically expressed in a low-level language like MIPS.
Interpreter
A program that implements a programming language. An interpreter's primary task is execution. It takes as input a source program and directly executes that program without producing any target program.
Thread
A single sequential computation. Helps make concurrent programming easier. There can be many threads running at a time, either interleaved or in parallel depending on the hardware, and a scheduler handles choosing which threads are running at any given time.
Stream
An infinite list. Can use functions to express streams because functions delay evaluation (isn't evaluated when it's declared). Ex: type 'a stream = Cons of 'a * (unit -> 'a stream)
back end of the compiler
Does code generation, including further optimization.
front end of the compiler
Does lexing, parsing, and semantic analysis. It produces an AST and associated symbol tables. It transforms the AST into an IR.
Compilation phases: 2) Parsing
During parsing, the compiler transforms the sequence of tokens into a tree called the abstract syntax tree (AST). This tree abstracts from the concrete syntax of the language. The AST typically forgets about concrete details.
Compilation phases: 3) Semantic analysis
During semantic analysis, the compiler checks to see whether the program is meaningful according to the rules of the language that the compiler is implementing. The most common kind of semantic analysis is type checking.
Desugaring
Eliminating syntactic sugar. An interpreter can desugar an AST into a simpler AST (in a sense, an IR)
Efficiency of Array Map
Every operation is constant time. At the expense of forcing keys to be integers, and they need to be small integers else the arrays will be huge.
Compilation phases: 5) Target code generation
Generating target code from the IR. Typically involves selecting concrete machine instructions (such as x86 opcodes), and determining which variables will be stored in memory (which is slow to access) vs. processor registers (which are fast to access but limited in number). A compiler attempts to optimize the performance of the target code, e.g. eliminating redundant computations
Closure
Has 2 parts: - a code part, which contains a function fun x -> e, and - an environment part, which contains the environment env at the time that function was defined.
Compilation phases: 4) Translation to intermediate representation (IR)
IR is a kind of abstraction of many assembly languages. Many source languages (e.g., C, Java, OCaml) could be translated to the same IR, and from that IR, many target language outputs (e.g., x86, ARM, MIPS) could be produced.
Lwt Promise
In Lwt, a promise is a write-once reference: a value that is permitted to mutate at most once. When created, it is like an empty box that contains nothing. We say that the promise is pending. Eventually the promise can be resolved, which is like putting something inside the box. Instead of being resolved, the promise can instead be rejected, in which case the box is filled with an exception.
Chaining vs Probing
In probing, the time required to search for or add an element grows rapidly as the hash table fills up. Chaining is usually to be preferred over probing: the performance of chaining degrades more gracefully. And chaining is usually faster than probing, even when the hash table is not nearly full.
Array Map
Keys are integers. Mutable. Size has to be declared when array is created.
middle end of the compiler (doesn't always exist)
Operates on the IR. Usually this involves performing optimizations that are independent of the target language.
Compilation phases: 1) Lexing
The compiler transforms the original source code of the program from a sequence of characters to a sequence of tokens. Tokens are adjacent characters that have some meaning when grouped together, e.g. "if" and "match" are tokens in OCaml. Lexing typically removes whitespace.
Hash Table
The key idea is that we assume the existence of a hash function hash : 'a -> int that can convert any key to a non-negative integer. Then we can use that function to index into an array, as we did with array maps.
Backus-Naur Form (BNF)
The standard way to describe the syntax of a language. Has the form: metavariable ::= symbols | ... | symbols e.g. e ::= i | e + e i ::= <integers>
Linear probing vs Double hashing
Two methods of probing. 1) Linear probing: search ahead through the array indices with a fixed stride (often 1), looking for an unused entry. Tends to produce a lot of clustering of elements in the table, leading to bad performance. 2) Double hashing: better strategy. Use a second hash function to compute the probing interval.
AST
abstract syntax tree. E.g. in Jocalf, it was the ast.ml file: type binop = BPlus | BMinus | .... type expr = EBool of bool | EInt of int | ...
Fibonacci numbers with laziness
let rec fibs = Cons(1, fun () -> Cons(1, fun () -> sum fibs (tl fibs))) let fib30lazy = lazy (take 30 fibs |> List.rev |> List.hd)
ArrayMap implementation
module ArrayMap = struct (* AF: [|Some v0; Some v1; ...|]] represents {0:v0, 1:v1, ...}. *But if element [i] of [a] is None, then [i] is not bound in the map. *) type 'v t = 'v option array let create n = Array.make n None let insert k v a = a.(k) <- Some v let find k a = a.(k) let remove k a = a.(k) <- None end
Laziness
module Lazy : sig type 'a t = 'a lazy_t val force : 'a t -> 'a end - A value of type 'a Lazy.t is a value of type 'a whose computation has been delayed. Intuitively, the language is being lazy about evaluating it: it won't be computed until specifically demanded. The way that demand is expressed with by forcing the evaluation with Lazy.force, which takes the 'a Lazy.t and causes the 'a inside it to finally be produced. - The first time a lazy value is forced, the computation might take a long time. But the result is cached aka memoized, and any subsequent time that lazy value is forced, the memoized result will be returned immediately.
Concurrency Method 1: Interleaving
rapidly switch back and forth between computations.
Rule of dynamic scope
the body of a function is evaluated in the current dynamic environment at the time the function is applied, not the old dynamic environment that existed at the time the function was defined.
Rule of lexical scope
the body of a function is evaluated in the old dynamic environment that existed at the time the function was defined, not the current environment when the function is applied.
Promises
the idea of a computation that is not yet finished: it has promised to eventually produce a value in the future, but the completion of the computation has been deferred or delayed.
Concurrency Method 2: Parallelism
use hardware that is capable of performing two or more computations literally at the same time.
Example of implementing a language using a mixture of compilation and interpretation
virtual machines that execute bytecode (e.g. JVM); a compiler translates the source language into bytecode, and the virtual machine interprets the bytecode.