Compilers- Test 1 Set 1
List the advantages and disadvantages of building a single pass compiler
+ Faster compile time + Easier to write - No code optimization - more memory used
List the primary requirements of compiler optimization. Why is it customary to have multiple code optimization passes?
1. Does not change the meaning of the code 2. Must improve speed and/or memory usage 3. Must be worth the effort/cost Must have multiple passes in case there are variable declarations later in the file. Each pass could deal with a different optimization technique(dead code elimination, copy propagation, etc)
Explain how to obtain a left linear grammar that is equivalent to a right linear grammar (i.e. they generate the same language) Given an RLG, explain how you would construct an LLG that generates the same language
1. Draw the finite state machine (FSA) for the RLG 2. Reverse the directions of transitions (arrows) 3. Make the final states, start states, and start states, final states 4. If the machine now has more than 1 start state, make a new state that will be the start state with e transitions to the start states. 5. Write the grammar for the new machine → RLG 6. Reverse the RLG Grammar→ LLG 7. (for the extra mile) Draw the new LLG to verify that the same language is created RLG → MRLG → R → MR → RLG → R → LLG (L) (L) (LR) (LR) (LR) R = L
List the purpose/characteristics of an intermediate language Intermediate code is the code that is in the middle of the source code and the target code. The intermediate code should be:
1. Easy to produce 2. Easy to convert into target code 3. Support optimization 4. Machine independent 5. Portable/retargetable
List the steps in the analysis phase of a typical compiler. Describe the output of each step.
1. Lexical Analysis Source code → Token string 2. Syntax Analysis Token string → Parse Tree 3. Semantics Analysis Parse tree → Decorated Parse tree 4. Intermediate code generator Decorated Parse Tree → Intermediate code
List the steps involved in a typical compiler. Be sure to specify the input and output of each step.
1. Lexical Analyzer Input: Source Code Output: Tokens 2. Syntax Analyzer Input: Tokens Output: Parse tree 3. Semantic analyzer Input: Parse tree Output: Decorated parse tree 4. Intermediate code generator Input: Decorated tree Output: Intermediate code 5. Code Optimizer Input: Intermediate code/ previous cycles optimized code Output: Optimized intermediate code 6. Code generator/ target code generation Input: Optimized intermediate code Output: Target object
List two programming language features that cannot be modeled by regular grammars.
1. Matching: brackets, braces, parenthesis, begin/end 2. Functions, variables definition before use 3. Loops (nested declaration)
Describe three lexical error handling techniques.
1. Panic Mode: stop execution- doesn't try to fix errors/reports errors 2. Error Transformation: Attempts to create correct token/ Fix the error and continue 3. Least distance error correction: Fix the error by CHANGING the token to the nearest valid token: ("ife to if"). Smallest # of transformation to make compatible file
Why use a grammar?
1. Precise, easy to use specification 2. Can generate parses automatically for certain grammars 3. Syntax Directed Translator 4. Language Extensions/Modification
v1. Explain the rationale behind ordering finite state automata in a lexical analyzer v2. List the rules for ordering finite state automata in a lexical analyzer
1. Recognize superstrings before Substrings (ie. Reals before ints) The rationale is that items can be recognized as part of the correct set without the need for backtracking/memory. If the interfere machines occurred before the real machine "1.3" would be recognized as integer 1, period, integer 3. A valid token would be returned before it could be tested for real 1.3 2. Most common machines first (ids, whitespace, ...) The rationale is that recognizing the most common cases first ensures that fewer checks will have to be performed (most tokens should enter one of the first few machines and won't have to be checked by the later, uncommon cases machines. The least amount of machines would have to check for each token. 3. Catchall Placed Last Ensures that all input is tokenized
List the main features of a programming language that would simplify lexical analysis
1. Reserved words 2. Not having comments 3. No overlapping lexemes 4. Use whitespace
Name 5 compiler construction tools
1. Scanner generator (lex) 2. Parser Generator (yacc) 3. Syntax Generated Translation Expressions 4. Automatic code generator 5. Compiler-Compilers (cocoms)
Explain why most compilers have a front end and a back end.
1. Simplicity: designing the front and back end separately produces two simpler machines that are connected. One big machine can be unwieldy and hard to design, maintain, and debug 2. Portability: The front end can be changed to understand a different language without having to alter the backend. 3. Modularity: Can update the compiler or change the source language without replacing both machines.
Explain the essential difference between a compiler and an interpreter
A compiler takes in source code and outputs optimized code that is run at a separate time. (Compile once, run many)More suited to static languages. An interpreter takes in source code and data and outputs the result immediately upon processing.The results of an interpreter are stored in memory. More suited for dynamic languages.
List the main programming language features that facilitate its implementation via a interpreter.
A language that has dynamic features is best for an interpreter (as opposed to a language with static features, which is best for an compiler). An interpreter must do analysis and synthesis in one step where a compiler does not. Some dynamic language features that would influence me to build a compiler are: 1. Dynamic scoping 2. Dynamic memory 3. Dynamic Types An interpreter is slow and less efficient. It has to go through the process every time. It has to reiterate and tokenize everytime.
V1. List the main programming language features that facilitate its implementation via a compiler V2. Suppose you are asked implement a translator (as opposed to an interpreter) for a new language. List the characteristics of the language and its application that would influence you to build a compiler.
A language that has static features is best for a compiler (as opposed to a language with dynamic features, which is best for an interpreter) because the interpreter must do analysis and synthesis in one step where a compiler does not. Some static language features that would influence me to build a compiler are: 1. Static memory allocation 2. Static scoping 3. Static type checking 4. Static variables 5. Optimization
List a programming language feature that cannot be implemented by a one pass compiler. Explain why it is impossible to implement this feature.
A one-pass compiler cannot optimize code, The compiler does not know the meaning of the code until the end of the first pass. To ensure that the optimization does not change the meaning of the code, the compiler must know the meaning before it starts to optimize. Extensive optimization, because such optimization requires knowledge of the entire input file which requires multiple passes
What does it mean for a compiler to have multiple passes?
A pass is the process of reading a source (input) file from start to finish and printing an output file. A compiler that has multiple passes reads source file or representation of a source file and then outputs a new file several times before outputting a file output file. This is different than a single pass compiler that reads the input file once and prints out one final output file. The allows for error checking.
Briefly describe the analysis and synthesis components of a compiler
ANALYSIS PORTION The analysis portion of the compiler (front end) is responsible for reading in the input (source program), generating tokens, parsing them, dealing with semantic analysis, and the symbol table manager and error handling. It breaks the source code into small parts and creates intermediate code. Consists of the phases (or parts of those phases) that depend on the SOURCE LANGUAGE. It is independent of the target language and hence the target machine SYNTHESIS PORTION The synthesis component consists of the code optimization and the final code generation (the back end). It takes in the intermediate code, never looking at the source code and produces either relocatable machine code or assembly code. It also deals with the symbol table and does error handling. Independent of the source language
List the pros and cons of having one pass instead of multiple passes
Advantages: • Smaller than a multiple pass compiler • Faster than a multiple pass compiler • Only needs to pass each phase of the compiler once Disadvantages: • Unable to generate as efficient programs • Limited scope of information • Data declarations must be made before they are referenced
List the pros and cons of having multiple passes instead of one pass
Advantages: Efficient programs (use less memory) Supports code optimization Supports Disadvantages: Larger than a single pass compiler Slower than a single pass compiler
Give the typical grouping tasks in a multiple pass compiler
Analysis Phase: 1. Lexical analyzer 2. Syntax analyzer 3. Semantic analyzer Synthesis Phase: 4. Intermediate Code generation 5. Code Optimizer (multiple passes) 6. Target Code generation (final pass)
v1. List the main programming language features that facilitate its implementation via a compiler. v2. Suppose you are asked to implement a translator for a new language. List the characteristics of the language and its applications that would influence you to build a compiler
Characteristics: • Static memory allocation • Static scoping • Static type checking • Optimization • Static variables Applications: • Real-time systems • Static features • Needs to be portable/retargetable • Needs to be fast/efficient • Needs to be run many times • Needs to be optimized
List the Advantages and disadvantages of compilation versus interpretation
Compilation + Compile once, run many (consistently) + Optimization + Static Language Features + Compile time errors (preferable to run-time) + Efficient/ fast once compiled + Portable/retargetable + Good for real time systems - Must wait for compilation (lag time) - Requires more resources - Don't work well with dynamic types and dynamic structures - Not as well suited for debugging as an interpreter Interpretation + Human oriented + Rapid program development and testing + Dynamic language features - Run time errors - Needs to be interpreted each time it is run - No optimization - No static language features - Program must be stored in memory - Not good for real time systems - Slow, less efficient
Name types of applications that compilers are best used for:
Compilers are fast and efficient so they are good for real time systems where time is at a premium They are also good for optimization They are also portable and good for re-targeting so they are good for writing something in one place for use someplace else) They are not good for human oriented applications (so for example they would not be good for debugging)
Explain how you might modify your lexical analyzer to make it more efficient
Currently it is an NFDAe, changing it to an DFA would speed it up because then it would eliminate backtracking. However, then it would have been far more difficult to program because it would have way to many states. Hence, we should just try to reduce backtracking as much as possible 1. We could put the reals, long reals, and integers together in one machine 2. We could use hashing (for what??). 3. We could also reorder the machines so that the most used machines are first.
List the relative advantages and disadvantages of a DFA, NDFA and NDFAe
DFA + No backtracking (Fastest/Time Efficient) - Difficult to read and write - Least general language - Least Flexible + Deterministic - Most states (large) 2n + No e transitions NDFA - Backtracking (Slower) + Easier to read and write + More general + More Flexible - Not deterministic + Least states <= NDFAe + No e transitions NDFAe - Backtracking (Slowest) + Easier to read and write + Simplest/Most general + More Flexible - Not deterministic + Least States (= NDFA) - Requires e transitions
It is well known that all finite languages are regular. Is it true that infinite languages cannot be regular? Provide a strong argument for your answer.
False, a* is regular and infinite. Draw machine: single end state/start state
Explain the significance and main characteristic of each of the four classes of machines: Finite State Machines, Pushdown Automota, Linear Bounded Automota, and Turing Machines
Finite State Machines -Type 3 - No memory - Recognize and generate finite regular languages - Equivalent to regular grammars - Expressible as regular expressions - Can backtrack Pushdown Automota - Type 2 - Stack Memory - Context Free Grammar / Language Linear Bounded Automota - Type 1 - Linearly bounded memory - Context Semantic Grammar / Language - Can't backtrack Turing Machines - Type 0 - Unlimited memory - Unrestricted Grammar - Recursively Enumerated Set - One to one matching for set of integers - Most powerful computer humans can build
Identify the principal purpose of each step in a compiler front end and back end
Front End 1. Lexical Analysis: Takes a stream of characters and tokenises them. Returns a token string and symbol table 2. Syntactic Analysis: Creates a parse tree/syntax tree of the tokens that shows their relation in terms of the language. Checks for syntax errors in the structure and order of tokens. i.e.: Groups the stream of tokens to into sentences. 3. Semantic Analysis: Gives meaning to the tree produced in syntactic analysis by decorating the tree. Examines the parse tree and performs type checking. i.e. Checks the meaning of the sentences. 4. Intermediate Code Generation: Takes the output of the front end and reads it in. Creates intermediate code, Begins transition to target code Back End 5. Code optimization: Optimizes the intermediate code to improve space or speed of the program or both. Returns more intermediate code. Can make multiple passes. 6. Code generation: Generates code in the target code language
Explain how you would modify your lexical analyzer to handle comments.
I would create a machine that would recognize the comment symbol special char for pascal (I believe its (* comment *) and // ). When the special character is recognized, I would move the forward pointer to the end of line/new line for // or until it reaches the other special character "*)" for the other type, without outputting any tokens. You need memory to handle multi-line comments to match the tail end
Regular Expression
I. ⦰, ɛ, and aɛ𝛴 are regular expressions II. If r1 and r2 are regular expressions, r1 | r2 , r1r2 , (r1), and r1* are regular expressions III. Regular expressions consist of any finite number of I and II
Explain why a finite language is always a regular language.
If a language is finite, it can be written as a regular expression. Therefore, it is a regular language.
Why are Error Transformations and Minimum Distance Error Corrections a bad idea?
It could change the input into something that you did not mean for it to be. It could change it to the wrong thing. Minimum Distance Error Corrections are more general than error transformations. This is not used anymore. People are more careful about putting intelligence into error correction.
DFAs and NFAs are supposed to be equally powerful in terms of the language they recognize. If this is true, why is it sometimes necessary to select a DFA model instead of an NFA model in an implementation? Give reasons for selecting a DFA over an NFA.
It is true that DFAs and NFAs are equally powerful in terms of the languages that recognize. However, sometimes DFAs must be selected because DFAs are deterministic. (NDFAs are non deterministic). Hence, DFAs must be selected when time is of the essence. DFA's are must faster than NDFAs because NDFAs have backtracking whereas DFAs do not. The single bad thing about non deterministic is when you hit a fork in the road you have to take both roads. This uses threads and backtracking.
Explain how you would modify your lexical analyzer to handle keywords.
Keywords would be placed in the symbol table, and the syntactic or semantic analyzer would have to determine if the token it being used as a keyword or id. This will increase complexity, and possibly produce ambiguity in the program Lexical analyzer cannot decipher id vs keyword, so pass to parser and parser will decide.
Describe 4 code optimization techniques
Loop Optimization: Lots of time and space is spent and used in loops. Loop optimization reduces the overhead associated with loops. Dead Code Elimination: The dead code elimination technique eliminates dead code. Dead code is useless code, or code that is not used or does not serve a purpose. Common Subexpression: This technique searches for instances of identical expressions (all evaluate to the same value) and analyze whether it is worthwhile to replace them with a single variable holding the computed value. Copy Propagation: This technique is the process of replacing the occurrences of targets of direct assignments with their values. Function Preserving Transformations: x+2-3 --> x-1
Non-Deterministic Finite State Automaton with e-Transitions NFAe
M: (S1, 𝛴i, 𝛿, S0, F) S: Finite set of states 𝛴i: Finite input alphabet states S0∈S: Start State F⊆S: Finite set of final states 𝛿: Transition funcion Sx(𝛴i ⋃ {ɛ}) → 2s 𝛿: sx𝛴→ 2s
In a typical multi-pass compiler, identify and group the tasks that can be performed in a single pass and those that require multiple passes
One pass for lexical analysis, syntax analysis, semantic analysis, and intermediate code generation. Multiple passes for code optimization One or many more passes for code generation.
Describe the main techniques for handling lexical errors.
PANIC MODE The lexical analyzer expects something and it does not get it so it gets flustered. We do not do anything about it in panic mode. We kick the can down the road and deal with the problem later. This causes problems for the compiler. Skip tokens until token in synch set is found. ERROR TRANSFORMATIONS i. Attempt to repair the input. It makes LOCAL changes. iii. Assumes that most lexical errors are the result of a single error transformation. iv. This is not a good strategy. 1. The lexical analyzer could change the input into something that you did not mean for it to be. It could change it the wrong way. 2. This type of error processing does not exist any more. 3. Now people are more careful about putting intelligence into error correction. 4. Can Lead to infinite loops MINIMUM DISTANCE ERROR CORRECTION It changes everything it thinks is bad into something that it thinks is good. It does this by using the minimum number of changes to change everything (all errors) into something that is recognizable. It computes the minimum number of error transformations required to transform the erroneous program into one that is syntactically well formed.
Give the main reasons for separating lexical analysis from syntax analysis in modern compilers
Portability, simplicity, modularity As they deal with two different aspects of compilation and require different inputs. As a lexical analyzer simply gets the tokens, and the syntax analyzer checks the syntax of the tokens, if one wanted to modify the syntax you'd only need to modify that machine.
All languages have:
Syntax: appearance and structure of sentences Semantics: Assignment of meaning to sentences Pragmatics: Usability
Which of the finite state automata formally models your lexical analyzer. For each finite state automaton, explain why it is or is not the right model
The machines themselves (ex. Id, relop, white space) are DFA. However, the overall lexical analyzer is NDFAe. DFA-> Is not the right model because it does not support backtracking. In our lexical analyzer, some states can go to multiple other states (ex. Long real, real, and Int machines). NFA -> Not the right model because we use epsilon to get into and out of the machines NFAe -> this is the correct model because the machines block (by using e) and use backtracking
Explain how a real lexical analyzer would handle integer constants.
They are computed during the lexical analysis phase and stored in a number table separate from the symbol table. The number is also stored in the token file with its attribute set as a pointer to the number table where the number is located. 1. Convert the "string" into an "int" 2. Create a separate symbol table to store numbers. 3. Determine between a positive or negative integer. 4. Compute the number: i. EX: 201 1. 2*100 2. +0 3. *10 4. +1
Explain why the algorithm for converting a finite state automaton to a regular expression is so significant.
Type 3: conversion can give a description of what any program does: we can build the google source code given enough time.
Regular Grammar
V: Finite set of variables T: Finite set of terminal symbols S: S∈V: Start variable P: Finite set of productions of the form A,B ∈ V x∈T* RLG: LLG: A → xB A → Bx B → x B → x
Does an algorithm exist for transforming a regular language to a finite state automaton?
Yes such an algorithm exists, It is significant because it allows the generation of machines/programs from a language specification Ex: Lex
Suppose your source code is processed by removing all whitespace characters before being passed to your lexical analyzer. Would this affect the identification of tokens? If so, which tokens?
Yes: ID tokens would most likely be affected. If a letter is identified first, the IDRES machine would read in the program as one identifier until a non alphanumeric symbol is reached. This will lead to many errors including ID too long and the inability to differentiate between IDs who come right after each-other. This leads to an increase in complexity and ambiguity. Also, reading in tokens which do not begin with a letter will also be cut incorrectly.