Due Tue April 22 23:59 (=11:59 PM)
Construct a scanner for the SRC3 Language described on the official class page in the file. http://www.cs.ucla.edu/classes/spring03/cs132/src3.html.
(In addition to comments as described, begun by /* and ended by */ which may be multiline, and nested, we could also have allowed single line comments begun by // and ended by a carriage return (new line), with the first comment opening symbol deciding the style of the comment. Consider how that would be implemented. It is NOT part of this assignment.)
Details of the output format will be covered in the discussion sections, and posted on the class web page, as will details of the electronic submission of your program.
You should use LEX, or FLEX, for this exercise, and should base your program on the finite automata approach described in class and in the Louden textbook.
Since the later stages will call on the scanner to deliver tokens as needed by them, you may find it convenient to have a driver routine to get tokens, one at a time, and to do the rest of the processing.
Please adhere to the specification described here as much as possible since
we are mostly running scripts comparing the outputs. Deviations could result in
point loss unless you come and argue about it. However, that would cost extra
time and efforts for both you and graders.
The program should assume input from standard input (which can be redirected
by you or a grader to read any specified file).
Each keyword has a token of the same name, but in uppercase. (i.e. var has a
token called VAR).
There is one exception: The keyword begin has a
token of KWBEGIN,
to avoid name conflict with a lex macro which is named BEGIN.
Token type ID (function call, variable name,
etc.) Name of the id should be included (e.g. ID(variable)).
|
Operator |
Token Name |
|
! |
NOT |
|
* |
MUL |
|
/ |
DIV |
|
% |
MOD |
|
+ |
PLUS |
|
- |
MINUS |
|
< |
LT |
|
> |
GT |
|
= |
EQ |
|
[ |
LBRACKET |
|
] |
RBRACKET |
|
( |
LPAREN |
|
) |
RPAREN |
|
: |
COLON |
|
; |
SEMI |
|
, |
COMMA |
|
<> |
NEQ |
|
<= |
LTE |
|
>= |
GTE |
|
|| |
OR |
|
&& |
AND |
|
:= |
ASSIGN |
|
.. |
SUBRANGE |
|
Literal type |
Token Name |
|
integers |
LITINT |
|
real |
LITREAL |
|
string |
LITSTRING |
Value of the literal should be included (e.g.
LITREAL(4.5)).
In case of string literal, string length must be also reported. It is possible
to have an empty string "", which corresponds to token LITSTRING(;0).
For input "a""b",
the token is LITSTRING(a"b;3).
Note: In case
of """
or similar, the correct output should be a LITSTRING(;0) token then
followed by an error about missing quote.
Note, there is no boolean literal.
The output is standard output.
Your output should identify each token in the
input file, should include the line number of occurrence. In the following
format
token:\tline#
\t indicates the tab character. Example:
ID(testProgram): 1
SEMI: 1
BEGIN: 2
INTEGER(5): 10
Note, the first line is line 1.
You can use the sample programs in SRC3 that are
provided as test programs, and you should include a collection of files to test
the scanner, but your scanner should accept and analyze any SRC3 program.
Errors: If the scanner detects a lexical
error, it should indicate the error as well as possible, the exit gracefully.
The format is the following
ERROR LINE line#:\terror msg.
Example:
ERROR LINE 10: unclosed comment.
Makefile, etc.
The executable name should be src3. README.
Simple text file, no word/html documents please. It should describe