CompSci 142 / CSE 142 Winter 2018 | News | Course Reference | Schedule | Project Guide
This webpage was adapted from Alex
Thornton’s offering of CS 141
CompSci 142 / CSE 142 Winter 2018
Project #4: Abstract Syntax Tree
Due date and time: Feb 16, Friday,
11:59pm; No late submission will be accepted.
This project implements the intermediate representation that we use to model crux programs. Now that the parser can recognize syntax errors and detect symbol definition and usage errors, we can proceed with building an intermediate representation of crux programs. Between the front-end (parser) and back-end (code generator), we'll represent crux programs as an Abstract Syntax Tree (AST). Once crux source code has been transformed into an AST data structure we can further analyze the crux program to detect type errors (lab 5), perform optimizations, and generate code.
The AST that we create must faithfully represent the crux program being compiled. Additionally, we seek to make the AST as clear and easy to use as possible. Because we will later perform traversals over the AST to check for semantic constraints, we consider all of the following issues in the design:
Concise.
We should like to clean up any unnecessary features that may
be present in the crux source. For example, the AST does not need to extra
parentheses that may have been used in an expression.
Meaningful.
Nodes in the AST should carry some kind of semantic meaning.
For example, we must track when and where variables and functions are declared
or defined.
Instructive.
Nodes in the AST should represent an action (or instruction)
that a computer might take. For example, we can have one node represent an if_statement
. It can have 3 children: condition
,
thenBlock
, and elseBlock
.
Organized.
Nodes in the AST should be categorically distinguishable.
That is, we should be able to identify the difference between statements and
expressions.
In Lab 2, we wrote a recursive descent parser. We recorded the entry and exit of each function and printed out the parse tree of crux source code. That tree records how a crux sentence (input source code) is broken down into syntactic pieces according to the rules of the crux grammar. Just as its name implies the Abstract Syntax Tree, abstracts away some of the pieces that might be present in the parse tree.
A crux sentence is allowed to carry extra information that does not necessarily change the semantics of the program. For example, according to the crux grammar parentheses can be used to nest expressions arbitrarily. Consider the following code examples, their parse trees and the corresponding AST.
Input Crux Statement |
Parse Tree |
Abstract Syntax Tree |
if true { return 5; } |
IF_STATEMENT EXPRESSION0 EXPRESSION1 EXPRESSION2 EXPRESSION3 LITERAL STATEMENT_BLOCK STATEMENT_LIST STATEMENT RETURN_STATEMENT EXPRESSION0 EXPRESSION1 EXPRESSION2 EXPRESSION3 LITERAL |
ast.IfElseBranch(1,1) ast.LiteralBoolean(1,4)[TRUE] ast.StatementList(2,4) ast.Return(2,4) ast.LiteralInteger(2,11)[5] ast.StatementList(4,1) |
if (((((true))))) { return 5; } |
IF_STATEMENT EXPRESSION0 EXPRESSION1 EXPRESSION2 EXPRESSION3 EXPRESSION0 EXPRESSION1 EXPRESSION2 EXPRESSION3 EXPRESSION0 EXPRESSION1 EXPRESSION2 EXPRESSION3 EXPRESSION0 EXPRESSION1 EXPRESSION2 EXPRESSION3 EXPRESSION0 EXPRESSION1 EXPRESSION2 EXPRESSION3 EXPRESSION0 EXPRESSION1 EXPRESSION2 EXPRESSION3 LITERAL STATEMENT_BLOCK STATEMENT_LIST STATEMENT RETURN_STATEMENT EXPRESSION0 EXPRESSION1 EXPRESSION2 EXPRESSION3 LITERAL |
ast.IfElseBranch(1,1) ast.LiteralBoolean(1,9)[TRUE] ast.StatementList(2,4) ast.Return(2,4) ast.LiteralInteger(2,11)[5] ast.StatementList(4,1) |
In the crux grammar, the expression chain (expression0 → expression1 → expression2 → expression3) contains only right-associative rules, which generate a right-associative parse tree. In spite of the parse tree generated, the operators and, or, add, sub, mul, and div are, semantically, all left-associative. The parse tree accurately capture precedence, but incorrectly represent operator associativity. Using right association for the grammar rules aids the construction of a left-factored LL(1) grammar, which in turn aids writing a recursive descent parser. However, we must now take care to ensure that the AST captures the left-associative semantics of these operators.
Input Crux Statement |
Parse Tree |
Abstract Syntax Tree |
return 3-1-1; // == 1 |
RETURN_STATEMENT EXPRESSION0 EXPRESSION1 EXPRESSION2 EXPRESSION3 LITERAL OP1 EXPRESSION2 EXPRESSION3 LITERAL OP1 EXPRESSION2 EXPRESSION3 LITERAL |
ast.Return(1,1) ast.Subtraction(1,11) ast.Subtraction(1,9) ast.LiteralInteger(1,8)[3] ast.LiteralInteger(1,10)[1] ast.LiteralInteger(1,12)[1] |
The AST sits somewhere between a parse tree and a list of instructions for a machine to follow. It contains fewer nodes than the parse tree, and organizes those nodes into semantic categories. It contains higher-level information than a list of instructions, including variable declarations and function definitions. We intend the AST to be an intermediate representation that bridges the gap between source code and machine code.
As a tree data structure, the AST is composed of nodes which inherit the
abstract base class, Command
.
(I didn't want to use the term "instruction".) Each Command
instance stores the line number and
character position of the source code where it begins. Concrete subclasses
store more specific information, to faithfully represent commands that actually
occur in crux source code. We create a command class to record the actions a
computer takes during execution of a crux program. For example, crux has
commands for declaring variables, looping, creating constants, evaluating
arithmetic and logical expressions, indexing arrays, etc.
For each command in the crux source code we associate a subclass of Command
. Some commands can only occur in certain
parts of the crux grammar. For example, FunctionDefinition
can only occur as part
of a DeclarationList
and not inside a StatementList
.
In contrast, both ArrayDeclaration
and VaribleDeclaration
can occur in either a DeclarationList
or a StatementList
We use these observations to break down the commands into 3 categories, each
represented by an interface: Declaration
,
Statement
, Expression
.
Command |
Category |
Description |
|
Declaration Statement |
The
creation of an array. |
|
Declaration |
The
creation of variable. |
|
Declaration |
The
creation of a function. |
|
Expression |
An
embedded boolean constant, either 'true' or
'false'. |
|
Expression |
An
embedded floating point number. |
|
Expression |
An embedded
integer number. |
|
Expression |
The
occurrence of an identifier as part of an expression (represents the address
of that symbol). |
|
Expression |
Load
the value at a given address. |
|
Expression |
Represents
basic arithmetic of two other expressions. |
|
Expression |
Represents
the comparison (greater than, greater equal, equal, not equal, lesser equal,
less than) of two expressions. |
|
Expression |
Represents
a logical operation (and, or) between two expressions or negation (not) of a
single expression. |
|
Expression |
An
operator for indexing into an array. Both the base and the amount to index
are expressions. |
|
Expression |
A
function call, including an ExpressionList of
arguments. |
|
Statement |
An
assignment of a source expression to a destination designator. |
|
Statement |
Represents
an conditional if-else branch. Includes the
condition expression, and StatementList's for each
of the then and else branches. |
|
Statement |
Represents
a while loop, including the conditional expression and a StatementList
for the body. |
|
Statement |
A way
for functions to return a value (and exit early). |
|
Declaration |
Represents
any error which may have occurred during construction of the AST. |
As the parser recursively descends through the parse tree of an input crux
source code, it constructs the AST incrementally. We modify the methods
responsible for recursive descent traversal so that the each returns a branch
of the final AST. For example, because the program
method parses a list of declarations, it returns a ast.DeclarationList
.
Likewise, each method in the expression chain returns an Expression
, being careful to implement correct
associativity for the operations involved. By returning AST nodes from each
method, the Parser can build up the final AST as it unwinds the recursive travesal.
From this point forward, we will not be changing the crux language to add
new operations. That means we won't be adding any new classes to the Command
class hierarchy. However, we will be
adding new functionality to each of the existing Command
classes. For example, in Lab 5: Types, we'll implement type checking and
ascribe a type to each node in the AST. Rather than change all the AST nodes to
add a method, we'll use the Visitor Pattern.
In the Visitor pattern, each subclass of the Command
hierarchy implements an accept(
Visitor visitor)
method that dispatches back to the
actual visitor. Any class inheriting the CommandVisitor
interface can implement
additional functionality not present on the Command
subclasses. For example, the supplied PrettyPrinter
permits us to print the
entire AST, but avoids adding a toPrettyString
in each of the Command
classes.
Modify your Parser to produce an AST, and return it from the parse
method. The AST must accurately reflect all
the commands that can occur in a crux program. It's OK for the AST to contain
the two symbol errors generated in Lab 3: Symbols. If the
parser encounters a syntax error, it is unable to represent the crux source as
an AST, and returns an Error node instead.
For convenience, you may get a start on this lab by using a pre-made Lab4_AbstractSyntaxTree.zip project,
which contains the ast
package. As before, you are both allowed and encouraged to make your program
easier to read and maintain by implementing helper functions with good names.
Parser.parse
method now returns a Command
node representing an AST.Compiler.main
driver to print out the
returned AST.ast
package, which contains an
implementation for each of the AST nodes.Token expectRetrieve(NonTerminal nt)
and Token expectRetrieve(Token.Kind kind)
.ast
node instead of void
.Test cases are available in this tests.zip file. The provided tests are not meant to be exhaustive. You are strongly encouraged to construct your own. (If chrome gives you a warning that "tests.zip is not commonly downloaded and could be dangerous" it means that Google hasn't performed an automated a virus scan. This warning can be safely ignored, as the tests.zip file contains only text files.)
A zip file, named Crux.zip, containing the following files (in the crux
package):
We will release an AutoTester soon. So please make sure that your work meets our requirements. We reserve the right to assign 0 points to any submissions which cannot be automatically unzipped and tested. Additionally, we reserve the right to assign 0 points to any submission which 'games' the automated testing by using a lookup which produces only outputs that correspond to the test cases we happen to use. Be sure to submit the version of the project that you want graded, as we won't regrade if you submit the wrong version by accident.
Enjoy!
Adapted from a similar
document from CS 141 Winter 2013 by Alex Thornton,