Compiling - Fundamcental Steps Of Compiling
Elementary Steps Of Compiling
Front End
Preprossessing
Lexical Analysis
- Turn the preprocessed source code to the token stream.
- Token: a tuple of a category code and a value.
- Categories: Key words, Identifiers, Constants, Operators, Separators.
- Each key word, operator, separator have its own category code; many identifiers share a same category code, and constants are categorised by type.
Syntax Analysis
- This step turns source code to an AST
Semantic Analysis
- Collect the identifiers by reading declarations.
- Conduct semantic check.
- Check double declarations, using of undeclared identifiers.
- Check type mismatch of operators and operands, or between operands.
IR Generation
- There’re comman and inportant IRs, TAC (Three Address Code) or AST (Abstract Syntax Tree).
Back End
Optimisation
- Equivalently transform the IR to a more efficient form.
RTL Generation
- Generate the Register Transfer Language.
Target Code Generation
- Generate the target code.
Binary Utilities
- Assembling
- Turn the assembly code to machine code.
- Linking
- Combine the object files to an executable file.
Development Of Compilers
Bootstrapping
- The compilers are written in machine code before 1970s.
- After 1980s, the compilers are written in high-level languages, which is called bootstrapping.
Each compiler have at lease 3 languages related to it: Source Language, Target Language, and Implementation Language.
There’s a T-diagram to describe compilers: S for source language programs, T for target language programs, and I for implementation language.
Bootstrapping: Use machine code to write a basic compiler for the source language, then use the basic implemented compiler to implement more advanced compilers for the source language, until the compiler can compile itself.
Transplanting, a.k.a. Cross-Compiling
Use the compiler works on architecture A to compile the source code to the target code for architecture B.
- A typical method to transplant and bootstrap.
+++++++++++++++ +++++++++++++++
| L' B | | L' B |
+++++++++++++++ +++++++++++++++ +++++
| L B | L | L B | B |
+++++ +++++++++++++++ +++++++++++
| L | L B | B |
+++++++++++ +++++++++++
| A |
+++++++