------------------------------------------------------------------------------ Project 3 output specification ------------------------------------------------------------------------------ > In the third project, you will need to write an AST tree-tansveral routine to > manage symbol table > compute inherited and synthesized attributes > do type promotion and type checking > process semantic errors. > The same tree-transveral routine should be further enhanced in the fourth project for code generation. > Whenever your tree-tranveral routine goes into a declaration section or a statement block, you should push the newly found declaration of symbols (variables, function declarations, etc.) into a symbol table and record the scope information (also the type information and constant value if there is one). > Whenever your tree-transversal routine finds a symbol in an expression or statement, you should use the symbol table to find the declaration of the symbol and do type promotion and type checking. > The best data structure for a symbol table is a hashtable-like structure. Most hash functions only deal with integer values only, so you may have to write a subroutine to convert a string (representing IDs) into integer values for hashing. NOTE: It is required that you use a hashtable-like (along with a hash function) data structure for your symbol table. You may use a hashtable implemented by anyone, but you must use one. That is, points will be taken off if you just do a linear search on a non-hashtable symbol table. > Two possible candidates for hashtable is a) the one posted on the web b) hsearch in UNIX > Both candidates need serious work in order to make them into a working symbol table. > Scope > Each subroutine has its own symbol table, so does the main program. > Statements defined in a stament block for one subroutine can only access the symbol table for that particular subroutine, but not any other symbol table, even the one of the main program. > Scope level starts from 0. Formal parameters of a subroutine and the subroutine name has the scope level 0. > For any symbols defined in nested statement blocks, the scope level increases along with the nested level. > A subroutine declaration will exist in two symbol tables. One in its own symbol table for recursive calls and the other in the symbol table of the parent scope. > NOTE: The executable should be named as src3. NOTE: An input source file is fed to your compiler through standard input. NOTE: Defined output should be sent to standard output (even the errors). > Output of the symbol table: > Whenever your tree-transversal program comes out of a statement block, your compiler should print out the symbols declared in the very last scope level, that is, the symbols that are going to be removed. The output format is: > Above 6 items should be printed starting from begin-of-line and delimited with a space. > Don't print out any item that is not applicable to the symbol. > For example, the very last item, is for constant declarations. If the symbol is not a constant, then the output should end at the 5th item, , and do not print out an extra space. > can be names of a variable, constant, or subroutine declaration. > is the line number where the symbol declared. > can be one of constant subroutine variable > are the type of the symbol, it can be either the basic types as int real string bool or array type as [] For example, a declaration of array[1..10, n2..n20] of int (where n1 = 2 and n20 = 20 are constants) should have following output: int[1..10, 2..20] > is the scope level of the symbol declared. It is an integer starting with 0. > is the value of a constant declaration. > Its value is what the lexer produced. > If a constant is declared from another constant, then you should print out the literal form in stead of the name of the other constant. > Once the scope level 0 symbols was dumped from a symbol table, the compiler should print one line information as SYMBOL TABLE DUMPED: AT > is the name of the subroutine or the main program. > is the line number where the subroutine was declared. > NOTE: Your compiler should stop whenever it detects a semantic error, but you should dump all symbols declared up to the point including symbols declared in symbol tables that are not visible. > NOTE: The symbols should be printed in the order of their declarations. > NOTE: You are required to use a hashtable for symbol table implementation. Because in some cases a hash value is not available, you do not have to print out the hash value for each symbol. Therefore you should indicate in the README.txt file on what hashtable utility do you use and how you use the utility (point out the source files and line number where a hash function is applied). > Output of type promotions: Whenever you need to do a type promotion, for example, from a string to bool, you need to print out the promotion as following format: TYPE PROMOTION AT : The line should also start from begin-of-line and all items are delimited with a space. > is where the promotion happened. > is the original type of the expression. > is the promoted type of the expression. > The print-out format of and is defined in the symbol table output format. > is the exact expression which causes the type promotion. > Produce the expression by transverse the sub syntax tree and do not use any space or control characters. > Output of undeclared symbol error: Whenever your compiler detects a symbol that is not defined, it should print out an error message and dump the rest of symbol tables and stop the program. The error message has following format: UNDECLARED SYMBOL ERROR AT : The line should also start from begin-of-line and all items are delimited with a space. > is where the error happened. > is the symbol that was not declared. > Output of duplicated symbol error: Whenever your compiler detects a symbol that is defined in the same symbol table and same scope before, it should print out an error message and dump the rest of symbol tables and stop the program. The error message has following format: DUPLICATED SYMBOL ERROR AT , : The line should also start from begin-of-line and all items are delimited with a space. > is where the second declaration happened. > is where a prebious declaration happened. > is the symbol that was declared twice. > Output of type mismatched error: Whenever your compiler detected a type mismatch, for example, from a bool to string, you need to print out the error message defined in following format and stop the program. TYPE MISMATCH ERROR AT : The line should also start from begin-of-line and all items are delimited with a space. > is where the type mismatch happened. > is the original type of the expression. > is the promoted type of the expression. > The print-out format of and is defined in the symbol table output format. > is the exact expression which causes the type mismatch. > Produce the expression by transverse the sub syntax tree and do not use any space or control characters. > Output of mismatched symbol error: Whenever your compiler detected a mismatched symbol name, for example, for example, a subroutine (or porgram) ID does not match the one ends it. You need to print out the error message defined in following format and stop the program. SYMBOL NAME MISMATCH ERROR AT The line should also start from begin-of-line and all items are delimited with a space. > is where the symbol was declared. > is where the definition of the symbol ended. > Misc errors. Obvious, I could not exhaust all possiblilities of semantic errors in this short spec. Thus, you should print out any other error that is not listed here as: SEMANTIC ERROR AT OF
: > where is the line number where the error occurred. >
is the section of the error prohibited in the semantic analysis spec. > is a descriptive information about the error. > Please check frequently for any revision of this spec. > Example 1: prog; var a: int; var b: real begin var c: int read a; b := 2.1; a := a + b; write a, b; end; end prog > A possible print out could be (it may need further confirmation): TYPE PROMOTION AT 8: int real a TYPE PROMOTION AT 8: real int a+b c 5 variable int 1 a 2 variable int 0 b 3 variable real 0 SYMBOL TABLE DUMPED: prog AT 1 > Example 2: prog2; var a : int; var b : 2.1 begin a := a + b; end; end prog2 > A possible print out could be (it may need further confirmation): TYPE PROMOTION AT 5: int real a TYPE PROMOTION AT 5: real int a+b a 2 variable int 0 b 3 constant real 0 2.1 SYMBOL TABLE DUMPED: prog2 AT 1 > Example 3: prog3; var a : int begin a := a + b; end; end prog3 > A possible print out could be (it may need further confirmation): UNDECLARED SYMBOL ERROR AT 4: b a 2 variable int 0 SYMBOL TABLE DUMPED: prog3 AT 1 > Example 4: prog; var a : int; var c : int abc () : real; begin end; end abc def (c:int) : real; begin var c : real /* valid, since formal parameter c is in scope level 0, this variable is in scope level 1 */ a := c; /* invalid since a is not in the symbol table */ c := abc (); /* invalid abc is not in the symbol table */ c := def (); /* valid. self-recursive function calls are allowed. I take it as putting def into the symbol table for scope level 0 */ end; end def begin end; end prog > A possible print out could be (it may need further confirmation): abc 5 subroutine real 0 SYMBOL TABLE DUMPED: abc AT 5 UNDECLARED SYMBOL ERROR AT 14: a c 12 variable real 1 def 10 subroutine real 0 c 10 variable int 0 SYMBOL TABLE DUMPED: def AT 10 a 2 variable int 0 c 3 variable int 0 abc 5 subroutine real 0 def 10 subroutine real 0 SYMBOL TABLE DUMPED: prog AT 1 > FAQ: > Revision to the Semantic Anaglysis sepc: > Under the section of 'Arithmetic Expressions with String' and 'Array Expressions', the int involved in an operation with a string or array cannot be negative. > An ID used in an array's range information can only be a declared constant, and a constant is immutable, so it cannot be a target of input statement, 'read'. > One student asked what are the for a type mismatch error in following piece of code. The answer is that, since it is not possible to know for sure what should be, then this is a misc error. b: bool; c: bool; d: bool; ... b := c + d; > If you are not able to determine the section number for a misc error, then put the section number of the language spec instead, or just leave it out. > One student asked that what would be the correct error message for for the relational expression in the following case? > Answer: In this case, it should be #2, but I would accept the first one if the message clearly states the case (like the one below). prog; var x: bool; var y: int; begin x := x < y; end; end prog 1) SEMANTIC ERROR ... bool type in relational expression 2) TYPE MISMATCH ERROR.... > One student asked: do we need to type promot arrays of int real or string to bool > Answer: No spec can be 100% precise at the release date. For example, C++'s ANSI standard was finalized till 1999 while it was first created in mid-80s. I will basically grade your project on clear-cut cases or if there are room for interpretation then I will accept all possible interpretations. In the specific case raised, you can either report a promotion or a type mismatch, but you must report one of them. Having say that, you can imagine that the grading of project 3 is not as rigid as the first 2 projects can be (since their specs are less complicated), but it may take me more time to finish grading.