Syntax analysis CFGs

Syntax analysis or parsing is the second phase of a compiler.

First we should know why CFG needed ?

The main task of the lexical analyzer is to read the input characters of the source program group them into lexemes and produce as output a sequence of tokens..

A lexical analyzer can identify tokens with the help of regular expressions and pattern rules.

But a lexical analyzer cannot check the syntax of a given sentence due to the limitations of the regular expressions.

Regular expressions cannot check balancing tokens, such as parenthesis.

Therefore, this phase uses context-free grammar (CFG), which is recognized by push-down automata.

CFG is a superset of Regular Grammar.

It implies that every Regular Grammar is also context-free, but there exists some problems, which are beyond the scope of Regular Grammar.

CFG is a helpful tool in describing the syntax of programming languages.

What is Context-Free Grammar ?

A context-free grammar (CFG) consisting of a finite set of grammar.

A context-free grammar has four components:

  1. Non-terminals (V): Non-terminals are syntactic variables that denote sets of strings.
  2. Terminal symbols (Σ): Terminals are the basic symbols from which strings are formed.
  3. Productions (P): The productions of a grammar specify the manner in which the terminals and non-terminals can be combined to form strings. Each production consists of a non-terminal called the left side of the production, an arrow, and a sequence of terminals, called the right side of the production.
  4. Start symbol (S): From where the production begins.

For example, in CFG,

S –> 0 | 1A, A –> 1

V = {S, A}
Σ = {0, 1}
P = {S–>0| S –> 1A | A –> 1}
S = {S}

What is Syntax Analyzers ?

A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams.

The parser analyzes the source code (token stream) against the production rules to detect any errors in the code.

The output of this phase is a parse tree.