Compiler Design
Pitfalls in top-down parsing
-Ambiguity: more than one production looks applicable in a derivation. -Left recursion: productions of the form. -Backtracking: might have to "back up"
Definition of FOLLOW sets
-For any nonterminal A, FOLLOW(A) is the set of terminals that can appear immediately to the right of A in some sentential form.. -If A can be the rightmost symbol in some sentential form, S =>* alphaA, then $ is in FOLLOW(A).
Definition of FIRST sets
-If alpha is any string of grammar symbols, then FIRST(alpha) is the set of terminals that begin strings derived from alpha. -If alpha =>* epsilon then epsilon is also in FIRST(alpha)
How does bottom up parsers work?
-Parse tree is constructed starting at the leaves. -Rightmost derivation is found, in reverse.
How does top down parsers work?
-Parse tree is constructed starting at the root. -Leftmost derivation is found.
Phases of Compiler
1. Lexical Analyzer 2. Syntax Analyzer 3. Semantic Analyzer 4. Intermediate Code Generator 5. Code Optimizer 6. Final Code Generator
Loader
A part of the operating system and is responsible for loading executable files into memory and execute them.
Compiler
A program that converts high-level language to assembly language.
Assembler
A program that converts the assembly language to machine-level language.
Linker
A program that links and merges various object files together in order to make an executable file.
Preprocessor
A tool that produces input for compilers. It deals with macro-processing, augmentation, file inclusion, language extension, etc.
Interpreter
Another program that translates high-level language into low-level machine language by reading a statement from the input, converts it to an immediate code, executes it, then takes the next statement in sequence.
Semantic Analyzer
Checks whether the parse tree constructed, follows the rules of the language. It produces an annotated syntax tree as an output.
Symbol Table
Data structure that is maintained throughout all the phases of a compiler. All the identifier's names along with their types are stored. It makes it easier for the compiler to quickly search the identifier record and retrieve it.
Code Optimization
Improves the speed/size of a intermediate code program by analyzing the code: - elimination of redundant instructions - replacing expensive operations by cheaper ones - reducing the frequency of execution of some instructions
How do you eliminate ambiguity?
Left factoring: delay the decision about which parse tree to use, by factoring out any common prefix between the productions for a nonterminal.
Syntax Analyzer
Reads the tokens from the lexical analyzer and then generates a parse tree according to the grammar of the source code. It reports syntax errors to the user.
Lexical Analyzer
Scans the source code as a stream of characters and converts it into meaningful lexemes in the form of tokens.
How do you minimize the overhead with reading the source program one character at a time?
Specialized buffering schemes.
Intermediate Code Generation
Takes the internal representation of the program and generates low-level code that is still machine independent.
Code Generation
Takes the optimized representation of the intermediate code and maps it to the target machine language.
Token
a name for a class of strings in the input.
Pattern
a rule that describes the set of strings associated with a token.
What is the code of the input buffering?
if fwd at end of first half then reload second half; fwd := fwd + 1; else if fwd at end of second half then reload first half; move fwd to beginning of first half; else fwd := fwd + 1;
How do you handle reserved words?
option 1 : have separate finite automaton for each reserved word. option 2 : put reserved words in a table, search this table when an identifier is found.
Lexeme
the actual input string that matches a pattern.