Your must tokenize the entire input stream using the


Programming Assignment

Program Specification

1. Your program must read 8-bit ASCII strings from standard input -- for instance, using the cin object in C++, or stdin in C. You must consume all input from standard input.

2. Your must tokenize the entire input stream, using the lexical specification below to dictate how to break the stream into tokens.

Lexical Specification:

 

-> [a-zA-Z]

 

letter

 

digit

-> [0-9]

 

newline

-> \n

 

for

-> for

 

while

-> while

 

if

-> if

 

else

-> else

 

identifier

-> (letter|_)(letter|digit|_)*

 

integer

-> digit+

 

float

-> (digit+\.digit*)|(\.digit+)

 

string

-> "[^"\n]*"

 

whitespace

-> (' ' | \t)+

 

comment

-> #.*

 

operator

->  '!' | '%' | '&' | '|' | '+' | '-' |

 

'*' ','

| '/' | '{' | '}' | '[' | ']' | ';' |

| '<' | '>' | '=' | '<=' | '>=' | '!='
| ':='

3. You must then output tokens to standard output (e.g., via cout in C++ or stdout in C), using the token information chart below to dictate what information you print about each token. Note that the lexical pattern, when italicized, refers to the pattern from the lexical specification above.

1. If the input contains unterminated strings, then instead of generating a string token, generate a single ERR2 token. The position indicator for the token should correspond to the beginning quote starting the string. Consume all input up to (but not including) the first newline. If there is no next newline, consume all remaining input. The length associated with the token should be reported appropriately.

2. The alphabet for this assignment consists of the following:
a. ASCII 0x09 and 0x0a (tab and newline)
b. ASCII 0x20 through 0x7e (all printable ASCII characters).

3. If you are in the middle of processing a string (meaning, you have seen the opening quote but not the end quote), and you then see a character that is not part of the alphabet, treat the bad character as if it were a newline for the sake of processing the string. That is, generate an ERR2 token for an unterminated string, and have the token end right before the bad character (meaning tokenization should resume at the bad character).

4. When you see one or more consecutive characters that are not in the alphabet, group them together and generate an ERR3 token. The length associated with the token should be the number of consecutive characters that are not in our alphabet. Resume tokenizing as normal after the bad characters.

5. When outputting a token, your output must consist of the following, in order:
a. "TID:", with no spaces (or other characters) proceeding. All letters shall be output as capital ASCII letters.
b. The colon may optionally be followed by spaces.
c. The Token ID of the token you are outputting. Token IDs must start at 1 and increase by 1 for each token of the input.
d. A single comma (note the token ID shall NOT be followed by spaces)
e. The comma may optionally be followed by spaces
f. "TYPE:". All letters shall be output as capital ASCII letters.
g. The colon may optionally be followed by spaces.
h. An integer representing the Numeric Type of the token.
i. The integer must only be followed by a left parenthesis- "(", meaning no spaces before the "(".
j. The left parenthesis must be followed by the "English Type" of the token (as indicated in the table above - case sensitive!), with no spaces preceding.
k. The English Type must be followed by a right parenthesis and comma- "),", meaning no spaces before the "),"
l. The comma may optionally be followed by spaces.
m. "POS:". All letters shall be output as capital ASCII letters.
n. The colon may be optionally followed by spaces.
o. An integer representing the position of the first character in the original input that led to the token match. The position is numbered from 0, and represents the number of 8-bit ASCII characters in the input stream that precede the character in question.
p. The integer must be followed by a single comma, with NO spaces (or other characters) in between.
q. The comma may optionally be followed by spaces.
r. "LEN:". All letters shall be output as capital ASCII letters.
s. An integer representing the number of bytes matched in the current token.
t. If the chart above indicates "None" in the "Value to output" column, then print a single newline (ASCII 0x0a - ‘\n'). The newline may optionally be preceded by spaces. IF THERE IS NO VALUE TO OUTPUT, YOU ARE DONE PRINTING THIS TOKEN.
u. The rest of the bullets are for when you are outputting a value only.
v. Output a comma (with NO preceding spaces), optionally followed by spaces, followed by "VALUE:", optionally followed by spaces.
w. Output the value, per the "Value to output" column above. All items that have a value should be a simple copy of the input, except for strings, which should omit the quotation marks around the string.
x. Print a single newline (ASCII 0x0a - ‘\n'). The newline may optionally be preceded by spaces.
6. Your program must take an optional command-line argument that dictates which tokens get output.
a. If no command line argument is given, then you must output all tokens in the token stream.
b. If the command line argument is a 0, you must also output all tokens in the token stream.
c. If the command line argument is a 1, you must output all tokens EXCEPT comments, whitespace, errors and newlines.
d. If the command line argument is a 2, you must output ONLY tokens for comments, whitespace, errors and newlines.
e. If the command line consists of anything else other than the above four options, then you should IGNORE all input from stdin, assume the input length is 0, and populate the token stream with only a single token of type ERR1, which you will then output, per below.

7. After reading in the entire input and generating tokens, output all tokens per the above specification. When you are done outputting all tokens you are supposed to output, then output the following:
a. An additional newline (creating a blank line)
b. The string "Totals:" (case sensitive, as with all strings in this assignment)
c. Optional space(s)
d. The string "len"
e. Optional space(s)
f. An equals sign
g. Optional space(s)
h. An integer indicating the length of the input stream (always 0 with ERR1, remember!)
i. A comma
j. Optional space(s)

k. The string "tokens"
l. Optional space(s)
m. An equals sign
n. Optional space(s)
o. An integer indicating the number of tokens in the token stream.
p. A comma
q. Optional space(s)
r. The string "printed"
s. Optional space(s)
t. An equals sign
u. Optional space(s)
v. An integer indicating the number of tokens you OUTPUT
w. Optional space(s)
x. A single newline.
8. After you finish outputting, your program must exit.
2. Other Requirements
You will receive a 0 on this if any of these requirements are not met!

9. The assignment is due on February 13 at 8am Eastern time. Late assignments will lose one letter grade per 24 hours.

10. The program must be written entirely in C or C++

11. You must submit a single source code file, unless you choose to use multiple files, in which case you must submit a single ZIP file, and nothing else.

12. If submitting a ZIP file, when the file unzips, your source files must unzip into the same directory (including any header files you need).

13. If submitting a ZIP file, there must not be ANY other files contained within the ZIP file. Again, you will get a 0 if there are.

14. If your program is written in C, it must compile ON MY REFERENCE ENVIRONMENT into an executable with the following command line: cc *.c -o assignment1

15. If your program is written in C, it must compile ON MY REFERENCE ENVIRONMENT into an executable with the following command line: c++ *.cpp - o assignment1

16. Your program should print nothing to stderr under any circumstances.

17. Your program's output will be tested in the reference environment only. Even if it works on your desktop, if it doesn't work in the reference environment, you will get a 0. With C and C++ this is a common occurrence due to memory errors, so be sure to test in the reference environment!

18. You must submit the homework through the course website, unless otherwise pre-approved by the professor.

19. You may not give or receive any help from other people on this assignment.

20. You may NOT use code from any other program, no matter who authored it.

3. Test Cases

Below are six sample test cases for you, which I will use in my testing. Typically, I use anywhere from 20-50 test cases (generally more than fewer). I will definitely use the below cases. I strongly recommend you create your own test harness and come up with a large number of test cases to help you get the best possible grade.

Attachment:- Programming Assignment.rar

Request for Solution File

Ask an Expert for Answer!!
Computer Engineering: Your must tokenize the entire input stream using the
Reference No:- TGS01594719

Expected delivery within 24 Hours