Software Language Engineering (CS3480)
Formal equivalent capability
Capability for a general purpose language A to be usable for implementing an interpreter for a general purpose language B (and vice-versa)
Terminal
RHS term of a derivation rule that doesn't appear on the LHS. It's usually in lowercase in single quotes (especially in ART)
RDP
Recursive Descent Parser
Binding
Relationship between a term variable and its corresponding sub-term
Pointer (in debugging)
von Neumann interpretation of the program counter
Set
{a0, a1, ..., a_n}
Priorities
| Left recursive (recursion at the left-most sign) signs: +, -, | *, / | Right recursive signs: ** V
von Neumann computer's data resources
Γ =〈I, S, P, T〉with: I: input list S: store (S = {〈a, σ〉| σ ∈ Σ, |Σ| < ∞, a ∈ *N*}) which is a map binding identifiers to values P: output T: program terms (or current state of the program), which are the lines of code of the program
eSOS: assign
〈assign(X, 3), {}〉's trace: 1. Fsos(assign(X, 3), { }) [4.11].C1 yields X [4.11].SC2 yields { X |−> 3 } [4.11] rewrites to done, { X |−> 3 }
inner/abstract syntax
(Semi-)Formal relationship to the syntax of the user language which stand on the computer's side
Tuple
<a0, a1, ..., a_n> <a, b>: ordered pair <a, b, c>: ordered triple
OSBTRD
*Ordered Singleton Backtrack Recursive Descent Parsing*: very simple parsing technique where grammar properties aren't computed. Such parsers can be produced as a syntax-directed translation from BNF. This isn't used for production parsers - due to its exponential length of input string for "nasty" grammars and - because it fails to recognise some strings that are in the language of the parsed grammar (common for LALR(1), SLR(1), LL(1) parsers which can affect any non-general parsing technologies). Parsers like that are (informally) a set of (possibly recursive) argumentless boolean functions (1 per NT) that uses a string buffer (input) and a global integer cc which holds the index of the current character. Each functions like that have a local integer rc (restart character) that remembers cc's value before modifying it, each alternate production Xi → ρ_j is laid out as a nest of ifs where terminal matching are direct whilst for ε we do nothing. Each functions also remembers which alternate succeeded. The parser also has a global variable co which contain the index of the next free slot in the oracle (which can be used to get a map of where the parser should have gone). Algorithm: Consider a grammar Γ = (N, T, Xs, P) where, as usual, N is a set of NTs {X1, X2, X3, ..., Xk}, T is a set of terminals, Xs is the start NT and P is a set of productions {X → ρ, ρ ∈ (N ∪ T )*} which are ordered. The order of NT and terminals doesn't matter as much as the order or productions within a particular NT Xi that is significant Inline semantics can't be done during parsing using that due to the backtracking to undo decisions made.
von Neumann architectures
- Design architecture for an electronic digital computer with parts consisting of a processing unit containing an arithmetic logic unit and processor registers; a control unit containing an instruction register and program counter; a memory to store both data and instructions; external mass storage; and input and output mechanisms - Any stored-program computer in which an instruction fetch and a data operation cannot occur at the same time because they share a common bus.
Intermediate form
Also called model when the outer syntax is used to load the members of a set of classes with data; which allows interrelationships UML diagram "decoration" using the outer syntax. Or internal representation
Derivation traversers
Applications that work directly with the results of syntax analysis: in general a set of derivations for our input string; most often a single derivation that has been obtained by disambiguating the results of our parser. Examples: anti-plagiarism software that compares trees from 2+ codes (ignoring ID names and individual lexemes)
Meta-variable
Arbitrary term to act as a place-holder in a tree
Inherited attributes
Attributes that pass down the tree (like parameters). When it's on a child on the LHS it's inherited ({*E11.v* = S.v;}) Example: X ::= Y Z Y Z1.v = 0
Big step
Avoids to explicitly reduce steps in steps over left/right arguments by specifying a term that can transition directly to its value, symbolised by => instead of small step's ->. However it loses the link between individual rewrites and machine level operations, in that an entire expression will be rewritten in a single step and the ability to reason in detail about exceptions is lost. But the specification, and the associated execution traces, become more compact
BNF
Backus-Naur Form <expr> ::= <term>|<expr><addop><term> S ::= X 'a' | # X ::= 'x' BNFSpec ::= RULE | RULE BNFSpec RULE ::= &ID '::=' cats cats ::= cat | cat '|' cats cat ::= '#' | elements elements ::= element | element elements element ::= &ID | &STRING_SQ
⊥
Bottom, value representing `none`. Also used as the value for an uninitialised binding (i.e. x ↦ ⊥)
L-attributed class
Class of attributes that must be resolvable in a single top-down left to right pass. It's naturally supported by RDPs. In every production: X -> Y0Y1Y2...Yk...Yn, every inherited attribute of Yk depends only on the attributes of Y0 ... Yk − 1 and the inherited attributes of X. This definition reflects the left-to-right construction order of the derivation tree
S-attributed class
Class with only synthesized attributes. It's naturally supported by Bottom-Up Parsing techniques like LALR(1) (Bison/YACC).
Semantics
Collection of meaning
CFG
Context Free Grammar: S ::= A 'c' | 'a' B A ::= 'a' 'b' B ::= 'b' 'c' => 'abc'
Data-centric language
Data description language (e.g: PoV-ray, XML)
DSL
Domain Specific Language
Right associativity
E ::= &INTEGER | &INTEGER '-' E
Left associativity
E ::= &INTEGER | E '-' &INTEGER Human convention
X ↦ y
Element of S which maps the identifier X to a value y (ie. binding)
SOS conditions
Elements above the line of an inference rule which may include side conditions (match against the returned value of a function) or transitions
ε
Epsilon: empty term that matches everything (# in Sandbox|ART)
GIFT
Gather-Insert-Fold-Tear: technique that rewrites the concrete (outer) tree under the control of a set of convenient tree annotations. ART only uses 2-fold operations. ^ : fold under (suppress the node and place that subtree into the parent) i.e. delete A → {a X^ b} X → {y 3} => A → {a y 3 b} ^^: fold over (overwrite the parent node with the annotated node) A → {a X^^ b} X → {y 3} => X → {a y 3 b} ^^^: tear (suppress the entire subtree of the annotated node) node:name ... [name] (insert a node named "name" somewhere in the tree while possibly being followed by fold ops) The best way to think about the GIFT operators is that they are annotations that are loaded into the derivation tree, and that a GIFT rewriting phase then rewrites the derivation tree under the control of those operators into a Rewritten Derivation Tree (RDT)
Admissible Attribute Grammar specification
Grammar specification if it may be computed by our predefined schedule. It's inadmissible otherwise
Reduction semantics transition graph
Graph displaying transitions between machine states (listed in a set). Example: 〈 [3], "output(10+2+4);"〉so T = "output(10+2+4);", P = [3] T then becomes: "output(12+4);" then "output(16);" leading Γ to be:〈[3, 16]〉
F_SOS
Interpreting function for SOS which takes an input term and interprets it by looking through the rules for possible transitions. It's present in conjunction with theta with represent what is left of the program.
Normal forms
Irreducible term, i.e. term that contains no redexes which is usually empty
Small step
It's the accurate term execution close to machine instructions
JIT
Just-In-Time: method consisting in compiling to machine code statements which have been executed many times. That's basically a "warm up" process done by interpreters when they detect appropriate regions for compilation
Non-Terminal
LHS term of a derivation rule. It's usually in uppercase (especially in ART).
Formal attribute grammar game
Let Γ = (T, N, S, P) where T is a set of terminals, N is a set of NTs (T ∪ N = ∅), S ∈ N is the start NT which must not appear on any RHS (and so S must not be recursive), and P = (T × (N \ S))* is a set of productions. Each symbol X ∈ V has a finite set A(X) of attributes partitioned into two disjoint sets, synthesized attributes As(X) and inherited attributes Ai(X). The inherited attributes of the start symbol elements of Ai(S)) and the synthesized attributes of terminal symbols (elements of As(t ∈ T)) are preinitialised before attribute evaluation commences: they have constant values. Annotate the CFG as follows: if Γ has m productions then let production p be: Xp_0 → x_p_1 x_p_2 ... x_p_n_p, n_p ≥ 0, Xp_0 ∈ N, Xp_j ∈ V, 1 ≤ j ≤ n_p A semantic rule is a function f_pja defined for all 1 ≤ p ≤ m, 0 ≤ j ≤ n_p; if j = 0 then a ∈ A_S(Xp, 0) and if j > 0 then a ∈ A I (X_p,j). The functions map V_a_1 × V_a_2 × ... × V_a_t into V a for some t = t(p, j, a) ≥ 0 The 'meaning' of a string in L(Γ) is the value of some distinguished attribute of S, along with any side effects of the evaluation
l0 : l1
List concatenation of l0 and l1
Term variable
Name which stands for an arbitrary term (tree). It's a sort of metavariable (as opposed to the program variables which are represented by elements of a program term). It's often written in italic
θ |> π
Operation of matching closed term θ against pattern π which result in either a failure (represented by ⊥) or a set of bindings. Example: seq(done, output(6)) |> seq(done, X) returns {X ↦ output(6)} seq(done, output(6)) |> seq(X, done) returns ⊥
Program counter
Pointer placed at a certain code line during the execution of a program. It's denoted Θ
Pattern substitution
Process by which we stitch sub-terms into a pattern to create a new closed term by substituting the bound sub-terms for meta-variables in the pattern
Program identity
Program transformation that doesn't change the semantics of a program term but does change the syntax and thus the reduction trace
parser generator
Program which reads specifications for a grammar Γ written in BNF (or EBNF) and outputs the source code for a parser, which will then (after being compiled) test strings to see if they are in the language L(Γ) and perhaps build a derivation tree
Tail call
Right recursive call
SOS inference rule
Rule that has 0+ conditions and a conclusion read the following way: if you have a configuration〈θ, α〉, and C1 succeeds and C2 succeeds and ... and Cn succeeds then transition to configuration〈θ0, α0〉 And that if the current configuration matches LHS of the conclusion and if everything succeeds then rewrite the current configuration into the RHS of the conclusion. This operational view of logical inference process is often referred as "reading round the clock". Each inference rules has their own environment where bindings aren't affected by others
Syntax
Rules governing valid word orderings. It's formed by syntax rules and vocabulary, and that's the specification of the language itself
Mid-associativity
S ::= '(' E ')'
Non-associativity
S ::= E '+' E
Left recursion
S ::= S ... | ... Sandbox can't handle that
User attributes
Sandbox/RDP: X:int ... X:v ART: X<v:int>
Semantic action
Sandbox/RDP: [** Sys.o.p('...'); **] ART: { Sys.o.p('...'); }
Control flow
Sequence of values displayed by the program counter during a program's execution for a particular input. The control flow of a program is the union of those for every possible input
;
Sequencing operator which indicates a sequence action requires the LHS side then the RHS to be evaluated. Example: output(10+2+4);output(6) In prefix form: seq( output(add(add(10, 2),4)), output(6) )
Environment
Set of bindings
BNF specification
Set of derivation rules
Vocabulary
Set of understood words in a language
Language
Set of word sequences
SPPF
Shared Packed Parse Forest
SOS conclusion
Single transition below the line of an inference rule
Ambiguity
Situation in which a sentence has multiple meanings
Synthesized attributes
Sort of inside-out evaluation, attributes that "move up" the tree (like a return result). Attributes made by semantic actions (S ::= E1'+' E2 {*S.v* = E11.v + E21.v}) are synthesized or/and when it's on the parent on LHS. Example: X ::= Y Z Y X.a = add(Y1.v, Y2.v)
Configuration
State of a computer which always contains a program term; with entities that represent side effects of a program term that needs to be remembered. It's represented by a tuple of terms comprising at lest a program term θ, and possibly including a store term σ, an environment ρ, an output stream α, an input stream β and some signals ν. A configuration Γ1 is related to Γ2 if and only if Γ2 can appear immediately after Γ1 in the execution of some program
SOS
Structural Operational Semantics
Reducible expression (aka redex)
Sub-term to be replaced by the program term rewrite phase that occurs after executing a prior term
π <| ρ
Substitution: operation of replacing term variables in pattern π with their bound terms from the environment ρ, resulting in a closed term.
Translation from language E to F
Syntax change from that of E to F's in such a way that semantics is preserved
Numerical Tower
Taken from Scheme and used in ART: ARTValue (Top-Down): Quantitity (15m4) - Number - Complex (15, 0) - Real (15.0) - Ratio (3/2) - Integer - {Integer32, Integer Arbitrary}
Closed term
Term with no term variables
Patterns
Terms which may contain term variables
Bootstrap
To pick oneself up recursively
Interpreter
Tool that construct an internal representation from the important elements of the derivation then traverse that form performing actions as they go
Compiler
Tool that transform input strings through a sequence of translations resulting in efficient machine code which can be subsequently loaded for execution by some processor
ART
Tool using GLL with native EBNF parsing
Sandbox
Tool-chain developed by implementing an OSBTRD parser
Pattern matching of terms
Unconditional rule such as seq(done, X) → X which matches any terms seq(done, X) to X regardless of what the term variable X stands for. It's the process of comparing a closed term to a pattern to decide if they match and if they do, constructing a table of term variables showing what they represent. It's also a way to extract sub-trees (sub-terms) from within closed terms --- Sepia coloured closed term to be matched against the blue pattern. Red coloured nodes: term variables. This process goes through both trees in tandem.
Engineering (in computing)
Usage of mathematics and scientific knowledge to invent new, useful systems and to improve the utility, performance, economics and reliability of existing systems
outer/concrete syntax
User language to be translated to an intermediate form being an inner/abstract syntax which is human-centric
Signals
Usually exceptions but also be break/continue statements
done
Value representing missing/empty terms
A =>* B
We can get from A to B in 0+ steps
WFF
Well formed formulae; syntactically correct code. Example: output(10+2+4) in a language like: S ::= 'output' '(' E ')' E ::= E '+' E E ::= E '*' E E ::= INTEGER
List
[a0, a1, ..., a_n]
a | b
a OR b
a ::= b
a is defined as b