Homework 1. Randomly testing compiler construction projects

Introduction

You are a reader for CS 132's Compiler Construction project, which asks students to write parsers for multiple languages, notably for a subset of Java. When you grade these parsers, you need to supply test inputs to them, but it is a pain to construct sample inputs by hand, and you'd rather automate the process. So your decide to write a program that generates random sentences from a given context free grammar, so that you can use the outputs of your program to torture-test parsers submitted by CS 132 students.

You've heard that OCaml is a good language for writing compilers and whatnot, so you decide to give it a try for this application.

However, we'll need to test your program too, so this assignment specifies exactly how you will generate "random" (actually, pseudorandom) sentences from a grammar. To test your program, we will supply a grammar, a pseudorandom number generator (PRNG), and a seed for the PRNG, all of which will be kept secret from you before the test. The desired output of your program is completely determined by these three inputs, so this will simplify our job of testing.

Definitions

symbol

A symbol used in a grammar. It can be either a nonterminal symbol or a terminal symbol; each kind of symbol has a value, whose type is arbitrary. A symbol has the following OCaml type:

type ('nonterminal, 'terminal) symbol =
  | N of 'nonterminal
  | T of 'terminal

symbol string

A list of symbols. It corresponds to the right hand side of a single production rule (i.e., a rule that has only one alternative). A symbol string can be empty.

alternative list

A list of symbol strings. It corresponds to all of a grammar's production rules for a given nonterminal symbol. By convention, an empty alternative list [] is treated as if it were a singleton list [[]] containing the empty symbol string.

production function

A function whose argument is a nonterminal value. It returns a grammar's alternative list for that nonterminal.

grammar

A pair, consisting of a start symbol and a production function. The start symbol is a nonterminal value.

chooser

A function that accepts a a pseudorandom seed s and returns a pair (r,s₁) where r is a pseudorandomly chosen boolean value (with true and false being equally likely) and s₁ is a new pseudorandom seed that can be passed to the chooser. Choosers are deterministic functions of their inputs: if you call them multiple times with the same input, they return identical results. As a consequence of the above definitions, the type of a chooser is 'a -> bool * 'a, where 'a is the seed's type. Here is an example low-quality chooser, based on a linear congruential generator that uses an integer as a seed:

let dumb_chooser s =
  s mod 2 = 0, (s * 7919 + 39) mod 65521

Assignment

Write a higher-quality chooser smarter_chooser, using OCaml's Random module. For efficiency the Random module is stateful, but choosers are supposed to be stateless and are supposed to return results based on their seed values only, so you'll need to write some interface glue that models the chooser's seed using the Random module's internal state. Your chooser should invoke Random.set_state, Random.int, and Random.get_state each exactly once and in that order, and these should be the only calls to any Random function in your code.

Write a curried function random_sentence that accepts a grammar, a chooser, and a seed for that chooser, returning a list of terminal values corresponding to a randomly generated sentence for that grammar. Use preorder traversal to generate the parse tree corresponding to the random sentence; the order is important because we want the solution to be completely determined by the grammar, chooser, and seed, so that we can test your results. In the traversal, whenever you visit a node of the parse tree and have two or more alternatives to choose from, invoke the chooser with the current seed to obtain a boolean value and a new seed. If boolean value is true, select the first alternative in your list; otherwise, choose from the remaining alternatives; in either case, use the chooser's returned seed for any remaining choices. For example, if there are 5 alternatives they should be chosen with probability 0.5, 0.25, 0.125, 0.0625, and 0.0625, respectively. Do not invoke a chooser needlessly; for example, there is no need to invoke a chooser if there is only one alternative.

Write two grammars for your random sentence generator, and supply three test cases for each grammar. Use the test cases below as a model for your grammars and test cases. One of your two grammars should be a transliteration of the Spiglet grammar; it should be as faithful as possible as to the original, e.g., the order of production rules should be the same. In your translation of the Spiglet grammar, use the string "<EOF>" to denote <EOF>, "0" to denote all <INTEGER_LITERAL>s, and "a" to denote all <IDENTIFIER>s. Call the Spiglet grammar spiglet_grammar and its set of nonterminals spiglet_nonterminals. Call the other grammar my_grammar and its set of nonterminals my_nonterminals. Call the test cases spiglet_test0 through spiglet_test2 and my_test0 through my_test2. You can supply more grammars and test cases if you like.

Your code may use the Random and List modules, but it should use no other modules other than your own code. Your code should be free of side effects, except for smarter_chooser's calls to Random functions. Simplicity is more important than efficiency, but your code should avoid using unnecessary time and space when it is easy to do so.

Assess your work by writing an after-action report that summarizes why you solved the problem the way you did, other approaches that you considered and rejected (and why you rejected them), and any weaknesses in your solution in the context of its intended application. This report should be a simple ASCII plain text file that consumes a page or so (at most 100 lines and 80 columns per line, please). See Resources for oral presentations and written reports for advice on how to write assessments; admittedly much of the advice there is overkill for the simple kind of report we're looking for here.

Submit

Submit three files via CourseWeb. The file hw1.ml should define smarter_chooser and random_sentence along with any auxiliary types and functions. The file hw1test.ml should contain your test cases. The file hw1.txt should hold your assessment. Please do not put your name, student ID, or other personally identifying information in your files.

Sample test cases

let dumb_chooser s =
  s mod 2 = 0, (s * 7919 + 39) mod 65521

(* An example grammar for a small subset of Awk, derived from but not
   identical to the grammar in
   <http://www.cs.ucla.edu/classes/winter06/cs132/hw/hw1.html>.  *)

type awksub_nonterminals =
  | Expr | Lvalue | Incrop | Binop | Num

let awksub_grammar =
  (Expr,
   function
     | Expr ->
	 [[T"("; N Expr; T")"];
	  [N Num];
	  [N Expr; N Binop; N Expr];
	  [N Lvalue];
	  [N Incrop; N Lvalue];
	  [N Lvalue; N Incrop]]
     | Lvalue ->
	 [[T"$"; N Expr]]
     | Incrop ->
	 [[T"++"];
	  [T"--"]]
     | Binop ->
	 [[T"+"];
	  [T"-"]]
     | Num ->
	 [[T"0"]; [T"1"]; [T"2"]; [T"3"]; [T"4"];
	  [T"5"]; [T"6"]; [T"7"]; [T"8"]; [T"9"]])

let awksub_test0 =
  (random_sentence awksub_grammar dumb_chooser 0
   = ["("; "0"; ")"])

let awksub_test1 =
  (random_sentence awksub_grammar dumb_chooser 1
   = ["0"])

let awksub_test2 =
  (random_sentence awksub_grammar dumb_chooser 2
   = ["("; "0"; ")"])

let awksub_test3 =
  (random_sentence awksub_grammar dumb_chooser 6
   = ["("; "1"; "+"; "("; "3"; ")"; "+"; "("; "("; "("; "0"; "+"; "2"; "-"; "(";
      "("; "--"; "$"; "("; "("; "("; "3"; ")"; ")"; ")"; ")"; ")"; ")"; "+"; "(";
      "$"; "("; "("; "1"; ")"; ")"; ")"; ")"; ")"; ")"])

let awksub_test4 =
  (random_sentence awksub_grammar dumb_chooser 10000
   = ["("; "$"; "("; "0"; ")"; "-"; "("; "0"; "-"; "("; "("; "("; "1"; ")"; ")";
      ")"; "+"; "("; "("; "("; "("; "("; "("; "("; "0"; ")"; "-"; "("; "("; "(";
      "$"; "("; "("; "("; "1"; ")"; ")"; ")"; ")"; ")"; "+"; "("; "2"; ")"; ")";
      ")"; "+"; "("; "2"; ")"; "-"; "("; "("; "2"; "-"; "1"; ")"; "+"; "1"; ")";
      ")"; ")"; ")"; ")"; ")"; "-"; "("; "("; "("; "("; "("; "("; "0"; ")"; ")";
      ")"; ")"; "-"; "("; "("; "1"; ")"; ")"; ")"; ")"; ")"; ")"])

type giant_nonterminals =
  | Conversation | Sentence | Grunt | Snore | Shout | Quiet

let giant_grammar =
  (Conversation,
   function
     | Conversation ->
         [[N Snore];
          [N Sentence; T","; N Conversation]]
     | Snore -> [[T"ZZZ"]]
     | Sentence ->
         [[N Quiet];
          [N Grunt];
          [N Shout]]
     | Quiet ->
         []
     | Grunt ->
         [[T"khrgh"]]
     | Shout ->
         [[T"aooogah!"]])

let giant_test0 =
  (random_sentence giant_grammar dumb_chooser 0
   = ["ZZZ"])
let giant_test1 =
  (random_sentence giant_grammar dumb_chooser 1
   = [","; "ZZZ"])
let giant_test2 =
  (random_sentence giant_grammar dumb_chooser 5
   = [","; "aooogah!"; ","; ","; "khrgh"; ","; "ZZZ"])

Sample use of test cases

When testing on SEASnet, make sure /usr/local/cs/bin is at the start of your path, so that you get the proper verison of OCaml. To do this, append the following lines to your $HOME/.profile file if you use bash or ksh:

export PATH=/usr/local/cs/bin:$PATH

or the following line to your $HOME/.login file if you use tcsh or csh:

set path=(/usr/local/cs/bin $path)

The command ocaml should output the version number 3.10.0.

If you put the sample test cases into a file hw1sample.ml, you should be able to use it as follows to test your hw1.ml solution on the SEASnet implementation of OCaml. Similarly, the command #use "hw1test.ml";; should run your own test cases on your solution.

$ ocaml
        Objective Caml version 3.10.0

# #use "hw1.ml";;
type ('a, 'b) symbol = N of 'a | T of 'b
…
# #use "hw1sample.ml";;
val dumb_chooser : int -> bool * int = <fun>
type awksub_nonterminals = Expr | Lvalue | Incrop | Binop | Num
val awksub_grammar :
  awksub_nonterminals *
  (awksub_nonterminals -> (awksub_nonterminals, string) symbol list list) =
  (Expr, <fun>)
val awksub_test0 : bool = true
val awksub_test1 : bool = true
val awksub_test2 : bool = true
val awksub_test3 : bool = true
val awksub_test4 : bool = true
type giant_nonterminals =
    Conversation
  | Sentence
  | Grunt
  | Snore
  | Shout
  | Quiet
val giant_grammar :
  giant_nonterminals *
  (giant_nonterminals -> (giant_nonterminals, string) symbol list list) =
  (Conversation, <fun>)
val giant_test0 : bool = true
val giant_test1 : bool = true
val giant_test2 : bool = true
#