LEX

Introduction

  • Lexical analyzer generator scans the input stream and converts sequences of characters into tokens.
  • The main job of lexical analyzer (scanner) is used to break up an input stream into more usable element(tokens)
  • Lex is not a complete language, but rather a generator representing a new language feature which can be added to different programming languages, called host languages.

What is Token in Lex ?

Token is a classification of groups of characters.

For example:

LexemeToken
=EQUAL_OP
*MULT_OP
,COMMA
(LEFT_PAREN
  • Lex is a tool for writing lexical analyzers.
  • Lex reads a specification file containing regular expressions and generates a C routine that performs lexical analysis. Matches sequences that identify tokens.

A Lex program consist of 3 sections:

  • The first section of declaration includes declaration of variable, constants and regular definition.
  • The second section is for translation rules which consists of regular expression and action with respect to it.
  • The third section holds whatever program subroutine are needed by the action.

Ambiguous Source Rules:

  • Lex can handle ambiguous specifications. When more than one expression can match the current input, Lex chooses as follows:
  • The longest match is preferred.
  • Among rules which matched the same number of characters, the rule given first is preferred.