Compiling - Fundamcental Steps Of Compiling

Compiling - Fundamcental Steps Of Compiling

Elementary Steps Of Compiling

Front End

Preprossessing

Lexical Analysis

  • Turn the preprocessed source code to the token stream.
  • Token: a tuple of a category code and a value.
  • Categories: Key words, Identifiers, Constants, Operators, Separators.
  • Each key word, operator, separator have its own category code; many identifiers share a same category code, and constants are categorised by type.

Syntax Analysis

  • This step turns source code to an AST

Semantic Analysis

  • Collect the identifiers by reading declarations.
  • Conduct semantic check.
    • Check double declarations, using of undeclared identifiers.
    • Check type mismatch of operators and operands, or between operands.

IR Generation

  • There’re comman and inportant IRs, TAC (Three Address Code) or AST (Abstract Syntax Tree).

Back End

Optimisation

  • Equivalently transform the IR to a more efficient form.

RTL Generation

  • Generate the Register Transfer Language.

Target Code Generation

  • Generate the target code.

Binary Utilities

  • Assembling
    • Turn the assembly code to machine code.
  • Linking
    • Combine the object files to an executable file.

Development Of Compilers

Bootstrapping

  • The compilers are written in machine code before 1970s.
  • After 1980s, the compilers are written in high-level languages, which is called bootstrapping.

Each compiler have at lease 3 languages related to it: Source Language, Target Language, and Implementation Language.

There’s a T-diagram to describe compilers: S for source language programs, T for target language programs, and I for implementation language.

Bootstrapping: Use machine code to write a basic compiler for the source language, then use the basic implemented compiler to implement more advanced compilers for the source language, until the compiler can compile itself.

Transplanting, a.k.a. Cross-Compiling

Use the compiler works on architecture A to compile the source code to the target code for architecture B.

  • A typical method to transplant and bootstrap.
          +++++++++++++++     +++++++++++++++
          | L'        B |     | L'        B |
+++++++++++++++     +++++++++++++++     +++++
| L         B |  L  | L         B |  B  |
+++++     +++++++++++++++     +++++++++++
    |  L  | L         B |  B  |
    +++++++++++     +++++++++++
              |  A  |
              +++++++