Compilation Phases

Lexical analysis:

This is the initial part of reading and analyzing the program text: the text is read and divided into tokens, each of which corresponds to a symbol in the programming language, e.g., variable name, keyword or number.

Syntax analysis:

This phase takes the list of tokens produced by the lexical analysis and arranges these in a tree-structure (called the syntax tree) that reflects the structure of the program. This phase is often called parsing.

Type checking:

This phase analyses the syntax tree to determine if the program violates certain consistency requirements, e.g., if a variable is used but not declared or if it is used in a context that doesn’t make sense given the type of a variable, such as trying to use a Boolean value as a function pointer.

Intermediate code generation

The program is translated to a simple machine-independent intermediate language.

Register allocation

The symbolic variable names used in the intermediate code are translated to number, each of which corresponds to a register in the target machine code.

Machine code generation

The intermediate language is translated to assembly language (a textual representation of machine code) for specific machine architecture.

Assembly and linking

The assembly language code is translated into binary representation and addresses of variables, functions, etc., are determined.

The first three phases are collectively called the frontend of the compiler and the last three phases are collectively called the backend. The middle part of the compiler is in this context only the intermediate code generation, but this often includes various optimizations and transformations on the intermediate code.

Placements Reading Material (IT)

Monday, February 14, 2011