Compilers

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Table driver parser

PDA, if we can push the start symbol then pop it off at the end we know we're done. Generic table parsing routine can use the table to parse the input.

the grammar for (1+2+3) and (1+2+3)+4 is

non-deterministic without arbitrary look ahead. This is why change a grammar from left recursive(non-terminals on the left side) to right recursive.

Back end of compiler is responsible for

optimization, and code gen

Lexical Analysis (scanning)

Take stream of individual chars, convert to tokens. Detect lexical errors(badly formed literals, illegal ch ars) discard white space, discard comments.

CFG contains

Terminal symbols (Tokens generated by lexical analyzer), non-terminal symbols, production rules, start symbol.

symbol table built during _____ _______

semantic analysis

Comparing item separated in the program or counting items will end up being a ______ ____-

semantic check.

Lexical analyzer generator

mechanically converts a set of regular expressions (and their action routines) into a program.

interpreters advantage over compilers

more flexible, better diagnostics, easier portability, take less space.

Why not regular expressions for syntactic analysis?

Because they do not have enough power being finite, we need to use CFGs because they can recognize nesting that is too deep for DFAs (input is more complex than O(n).

what does LL(1) mean?

Can we automatically construct a parser from the CFG that will create an unambiguous parse tree from any given string, or tell us the string is unparseable. Do a left to right scan of input, do a Leftmost derivation(right recursion), using 1 token of look ahead(look at the next token without consuming it). Top-down O(n) size of input string

Semantic analysis covers stuff we can't cover in cfg and isn't convenient

Comparing argument lists in function calls to that given in the function's declaration. (This isn't context-free) Ensuring a function has at least one return statement. (inconvenient with CFG, return could be anywhere in the function)

Interpreter Vs Compiler

Compiler- entire code analyzed, more efficient done once. Catches lots of errors. interpreter- one line at a time, less efficient done every time. More flexible.

Syntactic analysis (Parsing)

Context free, Take stream of tokens, and convert into a parse tree. Captures the hierarchical structure of the program (declarations definitions, blocks, statements, expressions, BUT NOT undeclared identifiers, mismatched function calls) syntactic errors are about form, NOT structure.

Semantic analysis

Context sensitive stuff,

Why we use regular expressions to do lexical scanning?

Correspondence between REs and finite automaton, and finite automaton are convertible into scanning program in mechanical way.

Front end of the compiler or the first phases

Discovers its meaning, comprised of lexical, syntactic and semantic anlysis steps.

Role of the semantic analyzer

Enforce all static semantic rules Annotate teh parse tree with any info required by intermediate code gen(clarifications: resolution of overloaded operations),(Dynamic semantic checks)

semantic content of scalar

Its Type and Value

LR(k) Parsing

Left-to-right scan of input, producing Rightmost derivation using k tokens of lookahead. Bottom-up = use production rules right to left. Use stack more sophisticated. First, k tokens in the input stream form the lookahead Shift:push the first input token onto the stack reduce: select some grammar rule e.g. X>A B C; pop C, B, A off the stack; and push X onto the stack. If parser makes it to the point where it can shift EOF marker onto t he stack, it has accepted the input as valid 3 cases in reducing that can occur: 1). top few items do not match the RHS of any rule of grammar: so we should shift next token onto stack 2). Top few items match the RHS of one rule of the grammar: reduce, popping off those matching items and pushing the LHS of the rule 3). Top few items match the RHS of multiple rules of the grammar: the grammar is ambiguous we don't know which reduction to make (there are more than 1 rule in an entry for the parse table) this is a reduce/reduce ambiguity shift/reduce conflicts: you could shift or reduce dangling else reduce/reduce conflict: the grammar itself is not ambiguous, the problem is we only get 1 token of look ahead A-> B c d A-> E c f B-> x y E-> x y shifting and reducing is done via a DFA, the DFA is applied to the contents of the stack, not parsing the CFA.

Improved CFG to make none ambiguous

Lower precedence comes first to enforce order of operations(hierarchy, all add ops come before mul ops in production rules, unless they are in parenthesis), to enforce associativity by having left recursion for that rule (expr->expr add_op term) expr->term | expr add_op term term-> factor | term mul_op factor factor->id | number | - factor | (expr) add_op-> + | - mul_op-> * | /

(In predictive parsing to decide if we can do predict parsing) To predict which production rule to use, we need 3 things for every non-terminal X:

NULLABLE(X): does x ever derive e? FIRST(X): which terminals can appear first in a string derived from X? FOLLOW(X): which terminals can appear immediately after X? given these we can construct the parse table for the CFG The parse table tells us which rule to use when we are trying to parse a given non-terminal and are seeing a given terminal every box has at most 1 rule meaning the CFG for LL(1) is non-ambiguous, boxes with no value mean parsing that results in error

Parsing complexity of regular expressions

O(n) where n is the length of the string you're trying to parse. Because the acceptance or rejection of an input string in a DFA will take no more transitions than the length of the input. (you consume a symbol on every transition)

Parsing complexity of Context free

O(n^3) where n is length of tokens you're trying to parse

Last phases of compilation do what?

Producing the corresponding target code, This is the bac k end. Comprised of optimization of intermediate code, target code generation.

patterns of token classes as

Regular expressions, easy to write, lots of theory to help with processing

Undeclared identifiers mismatch function calls are a _____ error. Why?

Semantic error because they require context.

Compiler translates _______ into equivalent _______ then ____ _____

Source program into equivalent target program and then (compiler) goes away

Target code generation

Translates intermediate form into the target language, hard to generate good code.

RE

no recursion, although it may apear, S->aS

non-deterministic without arbitrary lookahead

While the grammar is not ambiguous its non-deterministic without arbitrary look ahead. (1+2+3) vs. (1+2+3)+4 expr-> expr add_op term expr-> term term-> term mul_op factor term-> factor solution: expr-> term in the first case but expr-> expr add_op term in the second case.

The output of syntactic analyzer phase is:

a Parse tree, which represents the structure of a particular input token stream as determined by the languages grammar rules, unique for a given steam of tokens (if not, the grammar is ambiguous and needs to be fixed)

Recursive descent parser

a parser written for each non-terminal uses the parse table slighty more efficient

As static semantic phase runs it transforms the parse tree to an

abstract syntax tree, with only essential info, uses a symbol. Many compilers use AST as intermediate form to hand off to back end for code gen (lower level language) other compilers tree walk AST and generate different intermediate form.

Action and Goto table

analysis of grammar results in action and goto table when you're in a given state and you see a specific token whether you should shift or reduce it by a certain rule, then use a goto

Turing complete

any real-world general purpose computer or language can simulate the computational aspects of any other computer language.

interpretation in c happens when

at run time looks through the special characters in printf(%d, %s) to see what comes after

Code improvement happens when?

can happen right after semantic analysis, the earlier the better improvement, you cannot change the result.

dynamic semantic rules are enforced by the

compiler inserting specific code to perform the check. Example no divide by zero, if denom 0 compiler gets angry

compiling is a ______ analysis of the ________ ______

complete analysis of source code.

Parse Tree (Concrete syntax tree)

completely shows how a sequence of tokens was derived using CFG.

examples of static semantic checks

declaration of identifier before use no use of identifier in an inappropriate context correct number/ type of paras in subroutines distinct constant labels on the branches of switch non-void return type function returns a value

improved cfg

eliminate explicit alternation (expr->expr add_op term expr -> term) eliminate left recursion and do right recursion(we did left recursion originally for associativity) move recursion from left to right(but generates identical strings): (left recursion) X->Xa X->B replaced by: (right recursion) X->BX' X'->aX' X'->e

interpretation is translate ______ ______

every time

What does FLEX do?

flex takes your formal description of token categories and pukes out a c program that you can compile

How was first compiler written

in assembly and machine code.

All FA are finite the language accepted by FA can be

infinite, additionally each string in the language itself is finite.

CFGs can refer to themselves, the definition of recursive, unlike REs which have

left or right recursion exclusively

in a tree leave of an expression are always

literals or ids

C doesn't have any dynamic semantic checks

the hardware checks (divide by 0)

Java has a lot of static and dynamic semantic checks

to catch bugs and malicious code (buffer overflows, SQL injection)

Why would you write assembly?

to touch certain parts of the hardware, or for performance

Dynamic semantic checks (enforced at runtime)

variables not used in expression unless they have been assigned a value pointers are deference arrays subscripts within bounds arithmetic expressions don't over flow these depend on info that isn't known until run time.

P-code (byte code)

very simple language that is easy to translate to machine code

formal notation(CFGS, and REs) is good because

we can process them mechanically, which means we can write programs to manipulate them.

Self hosting compiler

written in its own language(c compiler written in c), compiles first time by bootstrapping(knows enough to get to the next level).


Kaugnay na mga set ng pag-aaral

Block 2 Biochem Practice Questions

View Set

Research Methods in Psychology Chapter 4 Smart Book

View Set

Chapter 13: Outcome Identification and Planning PrepU

View Set

ежедневные слова 3

View Set

Abdominal Wall and Fluid Collections and Hernia

View Set

ATI Pharmacology Neurological System Part 1 Test

View Set

Human Growth and Development (PY22052) - Chapter 23 Discussion Questions

View Set