Compiling - Language & Grammar
Some Concepts
- String: A sequence of finite symbols. If we define a word as a string, then the letters of the alphabet are the symbols; if we define a sentence as a string, then the words, whitespaces, and punctuations are the symbols.
- Alphabet: A finite set of symbols. Since alphabet is a set, there’re some operates on it as well, such as clousure, Kleene clousure, product, etc.
- Sentences, words are not constructed with random symbols in the alphabet, but with a set of rules, which is called grammar.
Grammar
In English, sentences consists of a noun phrase and a verb phrase. A noun phrase can consist of attributes and a simpler noun phrase (or even a noun), etc.
The formal definition of grammar is a 4-tuple $(V_T, V_N, P, S)$, where $V_T$ is the set of terminal symbols, $V_N$ is the set of non-terminal symbols, $P$ is the set of production rules, and $S$ is the start symbol.
Derivation & Reduction
Derivattion is the process of replacing a non-terminal symbol with a string of terminal symbols according to the production rules. Reduction is the reverse process of derivation. This is a top-down process. Reversely, we can also do a bottom-up process, which is called reduction.
Categorisation of Grammar
- Context Sensitive Grammar (CSG) is a grammar that has the form of $A \to \alpha$ where $\alpha$ is a string of symbols and $A$ is a non-terminal symbol.
- Context Free Grammar (CFG) is a grammar that has the form of $A \to \alpha$ where $\alpha$ is a string of symbols and $A$ is a non-terminal symbol. Regular Grammar is a grammar that has the form of $A \to aB$ or $A \to a$ where $a$ is a terminal symbol and $B$ is a non-terminal symbol.
- Regular Grammar is a grammar that has the form of $A \to aB$ or $A \to a$ where $a$ is a terminal symbol and $B$ is a non-terminal symbol.
Some concepts related to parse tree of CFG
- Parse tree: A tree that represents the derivation of a string according to the production rules of a CFG.
- Direct phrase: The sub-trees whose height is 2.