CompSci 142 / CSE 142
Winter 2018 | News | Course Reference | Schedule | Project Guide
This webpage was adapted from Alex
Thornton’s offering of CS 141
CompSci 142 / CSE 142 Winter 2018
Project #5: Types and Type-checking
Due date and time: Feb 26, Monday, 11:59pm; No late submission will be accepted.
This project implements additional semantic checks on the AST generated in the previous lab. Now that we can construct a tree representation of input crux source code, we are able to perform additional semantic checks. All popular computer languages implement some kind of type system. You are probably quite used to satisfying the Java type system by now. Indeed, hopefully, you are using it (alongside testing) to enforce design constraints on your code, and alert you to mistakes. Depending on the language, the type system can do type checks at runtime (Scheme,JavaScript,Python,Ruby), or compile time (Haskell,OCaml), or a mix of both. Crux sits squarely in the compile time camp, so a crux compiler has to check that all crux programs obey the rules of the crux type system, before it emits machine code.
In the past we've used the Scanner to recognize illegal characters, the Parser and grammar to recognize illegal syntax, and the Symbol Table to recognize illegal usage of variables and functions. Now we add a type system so that we can detect semantic errors such as the illegal use of expressions, invalid arguments to functions, returns incompatible with their function return type, etc. In many languages the type system embodies a powerful checker, which programmers use to enforce design constraints and catch errors.
With a type system, we aim to prevent the programmer from accidentally using
operations in an invalid manner. For example, it's nonsensical to add the
number 5
with the boolean true
. In industrial programming languages, the
type system can be quite complicated, allowing for automatic conversions
between types (called type coercion), operator and function overloading,
multiple or single inheritance, covariant returns, etc. Crux's type system is
very simple in comparison. It allows operations (add, sub, mul, div) only
between 'like' types. That is, ints operate with ints and floats operate with
floats, and return values must match the function return type exactly.
In spite of the simplicity of our system, we still practice good coding
techniques developed for more complex systems. The research article Design Patterns for Teaching Type Checking in a Compiler
Construction Course demonstrates how to use the Composite Pattern to
represent a set of base types together with operations which can be performed
on those types. The types
package in lab implements this design pattern. The abstract base class, Type
defines a default implementation for each of
the operations in the crux language. A convenience method getBaseType(String typeStr)
is present for
converting type lexemes into Type objects.
Crux has the numerical types int and float represented by IntType
and FloatType
.
It also has the logical type bool, represented by BoolType
.
Finally, the VoidType
is present to
distinguish functions that return a value from those which do not.
On top of the base types, Crux allows the creation of arrays, represented by
ArrayType
, and addresses,
represented by AddressType
.
The constructor for ArrayType
takes another type argument to distinguish what type the array contains. Using
this argument, we can make an array of one of the base types, or an array of
arrays. The constructor for AddressType
takes another type argument to distinguish what type of value the address
references. Using this argument, we can make addresses over any other type,
including arrays. This ability is especially useful for the address arithmetic
computation offsets for multi-dimensional arrays. More powerful languages use
the composite technique to build up structures and classes.
A computer language is not very useful without the function/procedure
abstraction. The crux compiler has a FuncType
which tracks both the arguments TypeList
and return type of a function. Within the type system, the function type could
be used to construct functions that return/accept arrays, or even functions
that return/accept other functions. However, the Crux grammar prevents us from
expressing these more complex programs, restricting us to only the base types.
Errors which might occur during type checking are represented with the ErrorType
, which contains a field, String message
, to convey the reason for the
error.
For each operation available in the crux language, we implement a checking
function in the Type
base
class. Most of these operations, such as the basic arithmetic operations, are
immediately recognizable.
Method/Operation |
Description |
Type add(Type that) |
Addition of two expressions. |
Type sub(Type that) |
Subtraction of two expressions. |
Type mul(Type that) |
Multiplication of two expressions. |
Type div(Type that) |
Division of two expressions. |
Type and(Type that) |
Logical and of two expressions. |
Type or(Type that) |
Logical or of two expressions. |
Type not() |
Negation of one expression. |
Type compare(Type that) |
Logical comparison of two expressions. |
Type deref() |
Obtain the value at a given address. |
Type index(Type that) |
Index into an array. |
Type call(Type args) |
A function call. |
Type assign(Type source) |
Assign an expression to a designator. |
We also implement an additional method boolean
equivalent(Type that)
, which allows the type system to detect if
two type objects are structurally equivalent. This operation comes in
especially handy when checking that the TypeList
of a function call matches the function's signature.
In order to make use of the type system, we shall have to make some
modifications to our existing codebase. One of these changes includes the
addition of a type field to the Symbol
class. When a function, variable, or array is declared we shall store the type
information with the newly created symbol. The type field allows the
type-checker to access the information when it sees an access of that symbol in
the AST.
Additionally, the Parser
needs some modification in order to attach or update the type of newly created symbols.
For example, I found that returning a Type
from the type()
grammar rule and a TypeList
from the parameter_list()
grammar rule to be convenient. I also found that the addition of the helper Type tryResolveType(String typeStr)
, which works
analogously to tryResolveSymbol
,
clarified my intent in the type()
grammar rule of the Parser. Finally, a Integer
expectIteger()
method improved the parsing of array indexes
during array declaration.
We implement the TypeChecker as a CommandVisitor. It's job is to walk the AST produced by the Parser, and check for any of the type errors that may occur in the Crux language. It accomplishes this task by first descending down to the leafs of the tree and then propagating type information up to the root as it unwinds. The checker must therefore associate some type information with each node of the AST.
Java does not allow us to extend the AST nodes for the purpose of adding
type information, nor are we able to change the argument type of the visitor methods.
As a result of this restriction, we create a HashMap<ast.Command,
Type> typeMap
which records the association for us. Because
the association is external to the nodes in the AST, the TypeChecker manually
manages the association.
The TypeChecker also contains a StringBuffer
errorBuffer
field for recording any type errors it encounters
during its traversal of the AST. Each time the TypeChecker enters an
association into the typeMap it checks for the ErrorType and records a message
into the errorBuffer when one is present. It's convenient to create a wrapper
for the typeMap.put()
method, which
catches any type error messages.
Some of the nodes in the AST require additional checks. For example, both the IfElseBranch and WhileLoop, much check that their conditions are a BoolType. Return statements must verify that they are compatible with the function being defined. At least one check, that functions declaring a non-Void return value actually return in all possible paths, requires significant thought. I strongly suggest writing helper and convenience methods.
types
package.For convenience, you may get a start on this lab by using a pre-made Lab5_Types.zip project, which contains the types
package. As before, you are both allowed and
encouraged to make your program easier to read and maintain by implementing
helper functions with good names.
Test cases are available in this tests.zip file. The provided tests are not meant to be exhaustive. You are strongly encouraged to construct your own. (If chrome gives you a warning that "tests.zip is not commonly downloaded and could be dangerous" it means that Google hasn't performed an automated a virus scan. This warning can be safely ignored, as the tests.zip file contains only text files.)
A zip file, named Crux.zip, containing the following files (in the crux
package):
crux
package:
NonTerminal, Parser, Scanner, Compiler, Token, Symbol and SymbolTable.ast
package: A class
for each Command, a CommandVisitor interface, and a PrettyPrinter.types
package: A class
for each Type, and a TypeChecker implementing the CommandVisitor
interface.We will release an auto tester to make sure that your work meets our requirements.
Adapted from a similar document from CS 141 Winter 2013 by Alex
Thornton.