Writing the lexical analyzer with lexical error checking


The first project involves writing the lexical analyzer with lexical error checking, and the compilation listing generator for the compiler.

The specification for the lexical structure of the language is the following: Comments begin with -- and end with the end of the line.

White space between tokens is permitted but not required. Identifiers must begin with a letter, followed by letters or digits. Integer literals consist of a sequence of digits.

Real literals consist of a sequence of digits containing a decimal point. At least one digit must be before the decimal point. Boolean literals are true and false The logical operators are not, and and or. Each logical operator should be a separate token. The relational operators are =, /=, >, >=, <, and <=. All six lexemes should be represented by a single token. The adding operators are the binary + and -. Both lexemes should be represented by a single token. The multiplying operators are * and /.

Both lexemes should be represented by a single token. The following punctuation symbols should be accepted: commas, colons, semicolons, and parentheses.

The following are reserved words: begin, boolean, else, end, endif, function, if, is, integer, real, returns, then The lexical analyzer should be created using flex. The compiler should produce a listing of the program with lexical error messages included after the line in which they occur. Any character than cannot start any token should be considered a lexical error.

It should also generate a file containing the lexeme-token pairs as a means to verify that the lexical analyzer is working correctly. Only token numbers are required, not token names.

The token numbers for the punctuation symbols should be the ASCII value of the character. The remaining tokens should be numbered sequentially beginning at 256.

Decoupling the listing code

As a matter of good object-oriented design, it is a good idea to decouple the code that displays the error messages from the flex source code. In the later stages of this project, there will be syntactic and semantic errors. The code for displaying and counting these errors should be separate from the flex and bison code since it will be called from both places.

In the skeleton code that I have provided you I am calling the functions as members of a Listing class. If defined as a class, this would really be defining a singleton object. One way to create a singleton is to make all the functions and data static. The standard practice in C++ is to put the class definition in the header file listing.h and the bodies of the member functions in a corresponding .cc file.

Because, unlike Java, C++ does not require all functions to be put in classes, another alternative would be to define these as ordinary functions that are not members of any class. In that case the function prototypes would still go in listing.h and the function bodies in the corresponding .cc file.

In the call to the appendError function I have passed in a parameter named LEXICAL. This is to designate that this error is a lexical error. My suggestion would be to define an enumerated type as follows:

enumErrorType {LEXICAL, SYNTAX, SEMANTIC};

The listing.h file would be a good place to put this enumerated type definition. By supplying the error type, the Listing class can keep a count of the number of messages of each kind, which should be displayed when the end of the program is reached. Adding another function to the Listing class to display the error count is best way to accomplish this.

Finally, let me explain the purpose of the appendError function. It should queue up the error messages so they are displayed at the end of the line. All error messages that occurred on that line can then be displayed by the nextLine function once the line is complete.

Request for Solution File

Ask an Expert for Answer!!
Basic Computer Science: Writing the lexical analyzer with lexical error checking
Reference No:- TGS0924061

Expected delivery within 24 Hours