Homework 1. Randomly testing compiler construction projects

Introduction

You are a reader for CS 132's Compiler Construction project, which asks students to write parsers for multiple languages, notably for a subset of Java. When you grade these parsers, you need to supply test inputs to them, but it is a pain to construct sample inputs by hand, and you'd rather automate the process. So your decide to write a program that generates random sentences from a given context free grammar, so that you can use the outputs of your program to torture-test parsers submitted by CS 132 students.

You've heard that OCaml is a good language for writing compilers and whatnot, so you decide to give it a try for this application.

However, we'll need to test your program too, so this assignment specifies exactly how you will generate "random" (actually, pseudorandom) sentences from a grammar. To test your program, we will supply a grammar, a pseudorandom number generator (PRNG), and a seed for the PRNG, all of which will be kept secret from you before the test. The desired output of your program is completely determined by these three inputs, so this will simplify our job of testing.

Definitions

symbol
A symbol used in a grammar. It can be either a nonterminal symbol or a terminal symbol; each kind of symbol has a value, whose type is arbitrary. A symbol has the following OCaml type:
type ('nonterminal, 'terminal) symbol =
  | N of 'nonterminal
  | T of 'terminal
symbol string
A list of symbols. It corresponds to the right hand side of a single production rule (i.e., a rule that has only one alternative). A symbol string can be empty.
alternative list
A nonempty list of symbol strings. It corresponds to all of a grammar's production rules for a given nonterminal symbol.
production function
A function whose argument is a nonterminal value. It returns a grammar's alternative list for that nonterminal.
grammar
A pair, consisting of a start symbol and a production function. The start symbol is a nonterminal value.
chooser
A deterministic function that accepts a pair (n,s) where n is a positive integer and s is a pseudorandom seed. (Its behavior is undefined if n is nonpositive.) When called, a chooser returns a pair (r,s1) such that r is pseudorandomly chosen from the n integers ranging from 0 through n−1, and s1 is a new pseudorandom seed that can be passed to another call to the chooser. The type of a chooser is therefore int * 'a -> int * 'a, where 'a is the seed's type. Here is an example low-quality chooser, based on a linear congruential generator that uses an integer as a seed:
let dumb_chooser (n, s) =
  let r = s mod n
  in r + (if r < 0 then n else 0), (s * 7919 + 39) mod 65521

Assignment

Write a higher-quality chooser smarter_chooser, using OCaml's Random module. For efficiency the Random module is stateful, but choosers are supposed to be stateless and are supposed to return results based on their seed values only, so you'll need to write some interface glue that models the chooser's seed using the Random module's internal state. Your chooser should invoke Random.set_state, Random.int, and Random.get_state each exactly once and in that order, and these should be the only calls to any Random function in your code.

Write a function random_sentence_prefix that accepts a quadruple consisting of a grammar, a chooser, a seed for that chooser, and a nonnegative integer limit lim. random_sentence_prefix should return a list of terminal values corresponding to the first lim symbols in a randomly generated sentence for that grammar; if the sentence has fewer than lim symbols, random_sentence_prefix should return the entire sentence. You may assume that lim is nonnegative. Use preorder traversal to generate the parse tree corresponding to the random sentence; the order is important because we want the solution to be completely determined by the grammar, chooser, and seed, so that we can test your results. In the traversal, whenever you visit a node of the parse tree and have n alternatives to choose from, choose by invoking the chooser with an argument of n and the current seed, and retain the resulting seed for your next choice. Invoke the chooser even if n is 1. By definition of "alternative list", n cannot be zero, so we will never give you test cases where there are zero alternatives.

Write two grammars for your random sentence generator, and supply three test cases for each grammar. Use the test cases below as a model for your grammars and test cases. One of your two grammars should be a transliteration of the Piglet grammar; it should be as faithful as possible as to the original, e.g., the order of production rules should be the same. In your translation of the Piglet grammar, use the string "<EOF>" to denote <EOF>, "0" to denote all <INTEGER_LITERAL>s, and "a" to denote all <IDENTIFIER>s. Call the Piglet grammar piglet_grammar and its set of nonterminals piglet_nonterminals. Call the other grammar my_grammar and its set of nonterminals my_nonterminals. Call the test cases piglet_test0 through piglet_test2 and my_test0 through my_test2. You can supply more grammars and test cases if you like.

Your code may use the Random and List modules, but it should use no other modules. Your code should be free of side effects, except for smarter_chooser's calls to Random functions. Simplicity is more important than efficiency, but your code should avoid using unnecessary time and space when it is easy to do so.

Assess your work by writing an after-action report that summarizes why you solved the problem the way you did, other approaches that you considered and rejected (and why you rejected them), and any weaknesses in your solution in the context of its intended application. This report should be a simple ASCII plain text file that consumes a page or so (at most 100 lines and 80 columns per line, please). See Resources for oral presentations and written reports for advice on how to write assessments; admittedly much of the advice there is overkill for the simple kind of report we're looking for here.

Submit

Submit three files via CourseWeb. The file hw1.ml should define smarter_chooser and random_sentence_prefix along with any auxiliary types and functions. The file hw1test.ml should contain your test cases. The file hw1.txt should hold your assessment. Please do not put your name, student ID, or other personally identifying information in your files.

Sample test cases

let dumb_chooser (n, s) =
  let r = s mod n
  in r + (if r < 0 then n else 0), (s * 7919 + 39) mod 65521

(* An example grammar for a small subset of Awk, derived from but not
   identical to the grammar in
   <http://www.cs.ucla.edu/classes/winter06/cs132/hw/hw1.html>.  *)

type awksub_nonterminals =
  | Expr | Lvalue | Incrop | Binop | Num

let awksub_grammar =
  (Expr,
   function
     | Expr ->
	 [[N Num];
	  [N Lvalue];
	  [N Incrop; N Lvalue];
	  [N Lvalue; N Incrop];
	  [N Expr; N Binop; N Expr];
	  [T"("; N Expr; T")"]]
     | Lvalue ->
	 [[T"$"; N Expr]]
     | Incrop ->
	 [[T"++"];
	  [T"--"]]
     | Binop ->
	 [[T"+"];
	  [T"-"]]
     | Num ->
	 [[T"0"]; [T"1"]; [T"2"]; [T"3"]; [T"4"];
	  [T"5"]; [T"6"]; [T"7"]; [T"8"]; [T"9"]])

let test0 =
  (random_sentence_prefix (awksub_grammar, dumb_chooser, 0, 1000)
   = ["9"])

let test1 =
  (random_sentence_prefix (awksub_grammar, dumb_chooser, 100, 1000)
   = ["("; "$"; "8"; ")"; "-"; "$"; "++"; "$"; "--"; "$"; "9"; "+";
      "("; "$"; "++"; "$"; "2"; "+"; "("; "8"; ")"; "-"; "9"; ")";
      "-"; "("; "$"; "$"; "$"; "$"; "$"; "++"; "$"; "$"; "5"; "++";
      "++"; "--"; ")"; "-"; "++"; "$"; "$"; "("; "$"; "8"; "++"; ")";
      "++"; "+"; "0"])

let test2 =
  (random_sentence_prefix (awksub_grammar, dumb_chooser, 100, 3)
   = ["("; "$"; "8"])

let test3 =
  (random_sentence_prefix (awksub_grammar, dumb_chooser, 10000, 1000)
   = ["$"; "("; "5"; ")"; "--"; "+"; "7"])

Sample use of test cases

When testing on SEASnet, make sure /usr/local/cs/bin is at the start of your path, so that you get the proper verison of OCaml. To do this, append the following lines to your $HOME/.profile file if you use bash or ksh:

export PATH=/usr/local/cs/bin:$PATH

or the following line to your $HOME/.login file if you use tcsh or csh:

set path=(/usr/local/cs/bin $path)

The command ocaml should output the version number 3.09.3.

If you put the sample test cases into a file hw1sample.ml, you should be able to use it as follows to test your hw1.ml solution on the SEASnet implementation of OCaml. Similarly, the command #use "hw1test.ml";; should run your own test cases on your solution.

$ ocaml
        Objective Caml version 3.09.3

# #use "hw1.ml";;
type ('a, 'b) symbol = N of 'a | T of 'b
…
# #use "hw1sample.ml";;
val dumb_chooser : int * int -> int * int = <fun>
type awksub_nonterminals = Expr | Lvalue | Incrop | Binop | Num
val awksub_grammar :
  awksub_nonterminals *
  (awksub_nonterminals -> (awksub_nonterminals, string) symbol list list) =
  (Expr, <fun>)
val test0 : bool = true
val test1 : bool = true
val test2 : bool = true
val test3 : bool = true
#

© 2006 Paul Eggert. See copying rules.
$Id: hw1.html,v 1.40 2007/04/11 17:15:45 eggert Exp $