CompSci 142 / CSE 142 Winter 2018 | News | Course Reference | Schedule | Project Guide
This webpage was adapted from Alex Thornton’s offering of CS 141


CompSci 142 / CSE 142 Winter 2018
Project #5: Types and Type-checking

Due date and time: Feb 26, Monday, 11:59pm; No late submission will be accepted.


Introduction

This project implements additional semantic checks on the AST generated in the previous lab.  Now that we can construct a tree representation of input crux source code, we are able to perform additional semantic checks. All popular computer languages implement some kind of type system. You are probably quite used to satisfying the Java type system by now. Indeed, hopefully, you are using it (alongside testing) to enforce design constraints on your code, and alert you to mistakes. Depending on the language, the type system can do type checks at runtime (Scheme,JavaScript,Python,Ruby), or compile time (Haskell,OCaml), or a mix of both. Crux sits squarely in the compile time camp, so a crux compiler has to check that all crux programs obey the rules of the crux type system, before it emits machine code.

In the past we've used the Scanner to recognize illegal characters, the Parser and grammar to recognize illegal syntax, and the Symbol Table to recognize illegal usage of variables and functions. Now we add a type system so that we can detect semantic errors such as the illegal use of expressions, invalid arguments to functions, returns incompatible with their function return type, etc. In many languages the type system embodies a powerful checker, which programmers use to enforce design constraints and catch errors.

Designing a Type-checking System

With a type system, we aim to prevent the programmer from accidentally using operations in an invalid manner. For example, it's nonsensical to add the number 5 with the boolean true. In industrial programming languages, the type system can be quite complicated, allowing for automatic conversions between types (called type coercion), operator and function overloading, multiple or single inheritance, covariant returns, etc. Crux's type system is very simple in comparison. It allows operations (add, sub, mul, div) only between 'like' types. That is, ints operate with ints and floats operate with floats, and return values must match the function return type exactly.

In spite of the simplicity of our system, we still practice good coding techniques developed for more complex systems. The research article Design Patterns for Teaching Type Checking in a Compiler Construction Course demonstrates how to use the Composite Pattern to represent a set of base types together with operations which can be performed on those types. The types package in lab implements this design pattern. The abstract base class, Type defines a default implementation for each of the operations in the crux language. A convenience method getBaseType(String typeStr) is present for converting type lexemes into Type objects.

The Base Types

Crux has the numerical types int and float represented by IntType and FloatType. It also has the logical type bool, represented by BoolType. Finally, the VoidType is present to distinguish functions that return a value from those which do not.

The Composite Types

On top of the base types, Crux allows the creation of arrays, represented by ArrayType, and addresses, represented by AddressType. The constructor for ArrayType takes another type argument to distinguish what type the array contains. Using this argument, we can make an array of one of the base types, or an array of arrays. The constructor for AddressType takes another type argument to distinguish what type of value the address references. Using this argument, we can make addresses over any other type, including arrays. This ability is especially useful for the address arithmetic computation offsets for multi-dimensional arrays. More powerful languages use the composite technique to build up structures and classes.

The Function Type

A computer language is not very useful without the function/procedure abstraction. The crux compiler has a FuncType which tracks both the arguments TypeList and return type of a function. Within the type system, the function type could be used to construct functions that return/accept arrays, or even functions that return/accept other functions. However, the Crux grammar prevents us from expressing these more complex programs, restricting us to only the base types.

The Error Type

Errors which might occur during type checking are represented with the ErrorType, which contains a field, String message, to convey the reason for the error.

The Operations

For each operation available in the crux language, we implement a checking function in the Type base class. Most of these operations, such as the basic arithmetic operations, are immediately recognizable.

Method/Operation

Description

Type add(Type that)

Addition of two expressions.

Type sub(Type that)

Subtraction of two expressions.

Type mul(Type that)

Multiplication of two expressions.

Type div(Type that)

Division of two expressions.

Type and(Type that)

Logical and of two expressions.

Type or(Type that)

Logical or of two expressions.

Type not()

Negation of one expression.

Type compare(Type that)

Logical comparison of two expressions.

Type deref()

Obtain the value at a given address.

Type index(Type that)

Index into an array.

Type call(Type args)

A function call.

Type assign(Type source)

Assign an expression to a designator.

Equivalence

We also implement an additional method boolean equivalent(Type that), which allows the type system to detect if two type objects are structurally equivalent. This operation comes in especially handy when checking that the TypeList of a function call matches the function's signature.

Laying Some Groundwork

In order to make use of the type system, we shall have to make some modifications to our existing codebase. One of these changes includes the addition of a type field to the Symbol class. When a function, variable, or array is declared we shall store the type information with the newly created symbol. The type field allows the type-checker to access the information when it sees an access of that symbol in the AST.

Additionally, the Parser needs some modification in order to attach or update the type of newly created symbols. For example, I found that returning a Type from the type() grammar rule and a TypeList from the parameter_list() grammar rule to be convenient. I also found that the addition of the helper Type tryResolveType(String typeStr), which works analogously to tryResolveSymbol, clarified my intent in the type() grammar rule of the Parser. Finally, a Integer expectIteger() method improved the parsing of array indexes during array declaration.

The Type Checker

We implement the TypeChecker as a CommandVisitor. It's job is to walk the AST produced by the Parser, and check for any of the type errors that may occur in the Crux language. It accomplishes this task by first descending down to the leafs of the tree and then propagating type information up to the root as it unwinds. The checker must therefore associate some type information with each node of the AST.

Java does not allow us to extend the AST nodes for the purpose of adding type information, nor are we able to change the argument type of the visitor methods. As a result of this restriction, we create a HashMap<ast.Command, Type> typeMap which records the association for us. Because the association is external to the nodes in the AST, the TypeChecker manually manages the association.

Reporting Errors

The TypeChecker also contains a StringBuffer errorBuffer field for recording any type errors it encounters during its traversal of the AST. Each time the TypeChecker enters an association into the typeMap it checks for the ErrorType and records a message into the errorBuffer when one is present. It's convenient to create a wrapper for the typeMap.put() method, which catches any type error messages.

Some of the nodes in the AST require additional checks. For example, both the IfElseBranch and WhileLoop, much check that their conditions are a BoolType. Return statements must verify that they are compatible with the function being defined. At least one check, that functions declaring a non-Void return value actually return in all possible paths, requires significant thought. I strongly suggest writing helper and convenience methods.

What do I need to implement?

  • Modify your Symbol classes to contain a field for storing the type.
  • Modify your Parser so that it attaches the type to the symbols when they are created.
  • Supply the appropriate implementation for the operation methods of classes in the types package.
  • Complete the implementation of the TypeChecker visitor, so that it passes only those programs which satisfy all of the Type Semantics section of the Crux Specification.
  • Add a type specification to each of the predefined functions.
  • Enforce that the main function has the appropriate signature.

For convenience, you may get a start on this lab by using a pre-made Lab5_Types.zip project, which contains the types package. As before, you are both allowed and encouraged to make your program easier to read and maintain by implementing helper functions with good names.

Changes from Lab 4: Abstract Syntax Tree

  • Update the Symbol.toString method to also report the type.
  • Update Compiler.main driver to call TypeChecker.check on the Parser's returned AST.

Testing

Test cases are available in this tests.zip file. The provided tests are not meant to be exhaustive. You are strongly encouraged to construct your own. (If chrome gives you a warning that "tests.zip is not commonly downloaded and could be dangerous" it means that Google hasn't performed an automated a virus scan. This warning can be safely ignored, as the tests.zip file contains only text files.)

Deliverables

A zip file, named Crux.zip, containing the following files (in the crux package):

  • The crux package: NonTerminal, Parser, Scanner, Compiler, Token, Symbol and SymbolTable.
  • The ast package: A class for each Command, a CommandVisitor interface, and a PrettyPrinter.
  • The types package: A class for each Type, and a TypeChecker implementing the CommandVisitor interface.

We will release an auto tester to make sure that your work meets our requirements.


Adapted from a similar document from CS 141 Winter 2013 by Alex Thornton.