Token:
- Think of a token as a categorized unit in a program’s code.
- It consists of two parts:
- The token name, which is like a label representing a type of unit (like “number” or “operator”).
- An optional attribute value, which gives extra information about the token (like the specific value of a number).
- In a program, tokens can be things like words (keywords), numbers, symbols (like “+” or “(“), and so on.
Lexeme:
- A lexeme is the actual sequence of characters in the source code that matches the pattern for a token.
- It’s like the specific instance of a token found in the code.
- For example, if “if” is a keyword token, then in the code, the word “if” itself would be the lexeme that matches this token.
Pattern:
- A pattern is a description of the form that the lexemes of a token can take.
- It’s like a blueprint or rule that defines what a valid token looks like.
- Regular expressions are commonly used to define patterns. These are special sequences of characters that help match patterns in text.
- For instance, if we have a keyword token pattern, the pattern might simply be the sequence of characters that make up that keyword, like “if” or “while”.