To convert a regular expression to a context-free grammar (CFG), you can follow a set of standard conversion rules.
Here are the rules for converting a regular expression to a CFG:
1. Terminal Symbols:
Each character or symbol in the regular expression becomes a terminal symbol in the CFG.
2. Start Symbol:
Create a new start symbol for the CFG.
3. Concatenation:
For every concatenation (ab) in the regular expression, add a new production rule in the CFG. The left-hand side of the rule should be a non-terminal symbol representing the concatenation, and the right-hand side should be the concatenation of the non-terminal symbols representing the individual characters/symbols.
4. Union:
For every union (a + b) in the regular expression, add a new production rule in the CFG. The left-hand side of the rule should be a non-terminal symbol representing the union, and the right-hand side should have two alternatives, one for each character/symbol in the union.
5. Kleene Star:
For every Kleene star (a*) in the regular expression, add a new production rule in the CFG. The left-hand side of the rule should be a non-terminal symbol representing the Kleene star, and the right-hand side should have two alternatives. One alternative should be the non-terminal symbol representing the character/symbol, followed by the non-terminal symbol representing the Kleene star itself. The other alternative should be ε (epsilon), indicating an empty string.
6. Parentheses:
If the regular expression contains parentheses, treat the contents within the parentheses as a separate expression and apply the conversion rules recursively.
By following these rules, you can convert a regular expression into an equivalent context-free grammar (CFG). Note that the resulting CFG may contain additional non-terminal symbols and production rules compared to the original regular expression.
Practice problem on Regular expresion to CFG
Q1. Write given CFG for R.E (011 + 1)*(01)* (RGPV 2020).
Here’s the context-free grammar (CFG) for the given regular expression (011 + 1)(01):
Non-terminal symbols:
S: Start symbol
A: Represents the regular expression (011 + 1)*
B: Represents the regular expression (01)*
Terminal symbols:
0, 1: Binary digits
Production rules:
- S -> AB
- A -> 011A
- A -> 1A
- A -> ε (epsilon)
- B -> 01B
- B -> ε (epsilon)
Explanation of the CFG:
The CFG consists of two non-terminal symbols, A and B, which represent the two parts of the regular expression. Non-terminal symbol S is the start symbol, which represents the entire regular expression (011 + 1)(01).
The production rules define the valid derivations for each non-terminal symbol. Rule 1 states that the start symbol S can be derived as AB, meaning it can be split into parts A and B. Rule 2 defines that A can be derived as 011A, indicating that the regular expression (011 + 1) can repeat zero or more times. Rule 3 states that A can be derived as 1A, allowing the regular expression 1 to repeat zero or more times. Rule 4 allows A to be derived as ε (epsilon), indicating that it can be empty.
Similarly, Rule 5 defines that B can be derived as 01B, indicating that the regular expression 01 can repeat zero or more times. Rule 6 allows B to be derived as ε (epsilon), indicating that it can be empty.
By applying these production rules, you can generate strings that match the given regular expression.
Q2. Regular Expression: (a + b)*
A. CFG:
Start Symbol: S
Terminal Symbols: a, b
Production Rules:
S -> Sa
S -> Sb
S -> ε (epsilon)
Q3. Regular Expression: ab
A. CFG:
Start Symbol: S
Terminal Symbols: a, b
Production Rules:
S -> aS
S -> bS
S -> ε (epsilon)
Q4. Regular Expression: (aa + b)*c
A. CFG:
Start Symbol: S
Terminal Symbols: a, b, c
Production Rules:
S -> SS
S -> a
S -> b
S -> c
Q5. Regular Expression: (a + b)c*
A. CFG:
Start Symbol: S
Terminal Symbols: a, b, c
Production Rules:
S -> aS
S -> bS
S -> cS
S -> ε (epsilon)
Q6. Regular Expression: (abc*)+
A. CFG:
Start Symbol: S
Terminal Symbols: a, b, c
Production Rules:
S -> T+
T -> aT
T -> bT
T -> cT
T -> ε (epsilon)