Week 1 - Compiler
What are terminal symbols?
Terminals are the basic symbols from which strings are formed. (Σ).
Ho w does the * symbol works in regular expression?
The * symbol means zero or more copies Example: a* corresponds to any string of a's: {ε, a, aa, aaaa, ....} (0+1)* corresponsds to all binary strings
What are productions?
(P). The productions of a grammar specify the manner in which the terminals and non-terminals can be combined to form strings. Each production consists of a non-terminal called the left side of the production, an arrow, and a sequence of tokens and/or on- terminals, called the right side of the production.
What are the components of context-free grammar?
- A set of non-terminals - A set of tokens, known as terminal symbols - A set of production - One of the non-terminals is designated as the start symbol(S); from where the production begins
What does the finite automation consists of?
- Finite set of states (Q) - Finite set of input symbols (Σ) - One Start state (q0) - Set of final states (qf) - Transition function (δ)
What is Non-deterministic Finite Automata?
-Given the current state, there could be multiple next states -The next state may be chosen at random -All the next state may be chosen in parallel
What are the regular operations that regular expression uses?
-union -concatenation, -star -brackets are used for grouping
What is ambiguity grammar?
a grammer is ambiguous if it has more than one parse tree(left or right derivation) for at leaset one string
What is Intermediate code generator?
After semantic analysis the compiler generates an intermediate code of the source code for the target machine. It represents a program for some abstract machine. It is in between the high-level language and the machine language. This intermediate code should be generated in such a way that it makes it easier to be translated into the target machine code.
What are tokens?
A token is an object that reperesnets something else, such as another object(either pythsical or virtual), or an abstract concept. A programming token is the basic component of source code . Character s are categorized as one of five classes of tokens that describe their functions (constants, identifiers, operators, reserved words, and separators) in accordance with the rules of the programming language.
What are non-terminal symbols
Non-terminals(V) are syntactic variables that denote sets of strings. The non-terminals define sets of strings that help define the language generated by the grammar.
What is smenatic analysis?
Semantic analysis checks whether the parse tree constructed follows the rules of language. For example, assignment of values is between compatible data types, and adding string to an integer. Also, the semantic analyzer keeps track of identifiers, their types and expressions; whether identifiers are declared before use or not etc. The semantic analyzer produces an annotated syntax tree as an output.
What is a compiler?
A compiler is computer software that transforms computer code written in one programming language into another programming language, usually a high level language into machine language. Compilers are a type of translator that support digital devices, primarily computers
What is final code generator?
In this phase, the code generator takes the optimized representation of the intermediate code and maps it to the target machine language. The code generator translates the intermediate code into a sequence of (generally) re-locatable machine code. Sequence of instructions of machine code performs the task as the intermediate code would do.
What is regular expression?
It describes a language
What is a symbol Table?
It is a data-structure maintained throughout all the phases of a compiler. All the identifier's names along with their types are stored here. The symbol table makes it easier for the compiler to quickly search the identifier record and retrieve it. The symbol table is also used for scope management.
What is deerivation?
It is a sequence of production rules, in order to get the input string. During parsing we take two decisions for some sentential form input: - Deciding the non-terminal which is to be replaced - Deciding the production rule, by which, the non-terminal will be replaced
What is Syntax analysis?
It is the second phase of a compiler. It is also called hierarchical analysis or parsing. It takes the tokens generated by the lexical analysis in a form called token streams and analyses against production rule to find the error and generated a parse tree. The next phase is called the syntax analysis or parsing. It takes the token produced by lexical analysis as input and generates a parse tree (or syntax tree). In this phase, token arrangements are checked against the source code grammar, i.e. the parser checks if the expression made by the tokens is syntactically correct.
What does the symbol + mean in regular expression?
It means union or or Example: 0+1 means either a zero or a one
What are the phases of compiler?
Lexcal Analysis, Syntax Analysis, Semantic Analysis, Intermediate code generator, and final code generator
How does concatenation works in regular expression?
The concatenation of two regular expression is obtained by writing the one after the other Example: (0+1)0 corresponds to {00, 10} (0+1)(0+ε) corresponse to {00,0,10,1}
What is Lexical Analysis?
The first phase of the compiler. It converts a sequence of characters to tokens by removing whitepaces and comments. Meaningful tokens are called lexemes. The program performing lexical analysis is also called scanner or tokenizers Lexical analysis is the first phase of a compiler. It takes the modified source code from language preprocessors that are written in the form of sentences. The lexical analyzer breaks these syntaxes into a series of tokens, by removing any whitespace or comments in the source code. If the lexical analyzer finds a token invalid, it generates an error. The lexical analyzer works closely with the syntax analyzer. It reads character streams from the source code, checks for legal tokens, and passes the data to the syntax analyzer when it demands.
What is code optimization?
The next phase does code optimization of the intermediate code. Optimization can be assumed as something that removes unnecessary code lines, and arranges the sequence of statements in order to speed up the program execution without wasting resources (CPU, memory).
What is the Finite automata?
is a simple idealized machine used to recognize patterns within input taken from some character set(or alphabet). The job of an FA is to accept or reject an input depending on whether the pattern defined by the FA occurs in the input.
What is Context-Free Grammar?
is a superset of Regular Grammar