Project 1. Scheme program matcher

Introduction

Your boss at Jerry-built Computer Networks Corp. (JCN) is thinking of countersuing Tortuous Data Processing Inc. (TDP), claiming that TDP stole bits of JCN's Scheme code and used it in TDP's operating system. Your boss wants you to look through TDP's code for certain coding style patterns that are emblematic of JCN's style. Every instance of those patterns will be useful in your company's potential case against TDP. There are a lot of patterns and a lot of Scheme code in TDP's operating system, so efficiency will be important. Your boss plans to run your pattern-matching code on the JCN Scheme Engine, a hardware implementation of Scheme that is blindingly fast if you limit yourself to the Scheme primitives that are implemented directly in the hardware.

In this assignment you will write a simple pattern-matcher generator for Scheme programs. Your generator will take a pattern and produce a pattern-matcher. This is akin to the POSIX regcomp regular expression compiler. However, unlike regcomp your generator will produce a procedure that can be called directly, not data to be passed to a separate regexec interpreter. For efficiency reasons, your procedure will limit itself to the JCN Scheme Engine's primitives.

It should be said right up front that this assignment is just a toy: real program analyzers are much more complicated than this, and don't rely solely on the simple kinds of backtracking used here.

The key notion of this assignment is that of a "pmatcher". A pmatcher is a procedure that inspects a given program fragment to find a match for a pattern. For example, a pmatcher for the pattern (append (or (list '+ symbol) (list '+)) (list symbol)) will succeed only on a program fragment that is a two- or three-element list whose car is + and whose remaining elements are symbols.

As you can see by mentally executing the example pattern on the fragment (+ a), pmatchers sometimes need to try multiple alternatives and to backtrack to a later alternative if an earlier one is a blind alley.

Definitions

fragment
Any object that can represent part of a Scheme program. This definition excludes infinite objects, but it includes improper lists.
pattern
an object that represents a set of fragments. It is said to match each such fragment. It takes one of the following forms:
boolean
matches any boolean. That is, it matches only #t and #f.
symbol
matches any symbol. For example, it matches cons but not 1.0.
pair
matches any pair.
list
matches any (proper) list.
char
matches any character.
vector
matches any vector.
number
matches any number.
string
matches any string.
(is procedure)
matches any object obj such that (procedure obj) returns true. For example, (list 'is (lambda (x) x)) returns a pattern that matches any object other than #f. It is an error if an error occurs during the execution of (procedure obj).
obj
matches any object. Don't forget that lists are objects.
'obj
matches any fragment that is equal? to obj. For example, '(lambda (x) x) matches (lambda (x) x) but not (lambda (y) y).
(or pat…)
matches any fragment that any pattern pat matches. For example, (or) does not match any fragment, and (or 'car 'cdr 'cons) matches car, cdr, and cons.
(cons pat1 pat2)
matches any pair whose car matches pat1 and whose cdr matches pat2. For example, the pattern (cons 'x 'y) is equivalent to the pattern '(x . y).
(list pat1 … patN)
matches any N-element list whose members match each pattern in turn. For example, the pattern (list '+ 'x 'y) is equivalent to the pattern '(+ x y). N may be zero; the pattern (list) is equivalent to the pattern '().
(append pat1 … patN pattail)
matches any concatenation of fragments that are matched by each pattern, respectively. More formally, if pat1 matches list1,… patN matches listn, and pattail matches obj, then the pattern (append pat1 … patN pattail) matches the object returned by (append list1 … listn obj). The last pattern pattail need not match a list; if it matches a non-list then the overall append pattern matches an improper list. N may be zero; the pattern (append pattail) is equivalent to pattail. None of the first N patterns may be of the form (is procedure).
(* pat)
matches any concatenation of fragments that are each matched by pat. If the matched concatenation is of two or more fragments, this pattern is equivalent to (append pat (* pat)). If the matched concatenation is of one fragment, this pattern is equivalent to pat. If the matched concatenation is of zero fragments, this pattern is equivalent to the pattern '().

A pattern is not a procedure call. For example, you do not need to define a procedure called is. The symbol is is part of the pattern returned by (list 'is boolean), but is does not need to be defined as a procedure.

pmatcher
a procedure with one arguments: a fragment frag. A pmatcher matches frag if it returns true; it fails to match frag if it returns #f.

Assignment

Write a procedure (make-pmatcher pat) that returns a pmatcher for the pattern pat.

Your implementation should use portable Scheme code, and it must be free of side effects. It should avoid using unnecessary storage. Tail recursion is desirable but is not required, and you will probably find it necessary to use non-tail-recursive procedures both in make-pmatcher and in the pmatchers that make-pmatcher returns.

There are extra constraints on the pmatchers that make-pmatcher returns. These pmatchers must use only the following Scheme primitives:

along with the following procedures and special forms: <, <=, =, >, >=, -, +, and, boolean?, car, char?, cond, cons, cdr, eq?, equal?, eqv?, if, lambda, let, letrec, list?, not, null?, number?, or, pair?, quote (or '), string?, symbol?, and vector?.

In all cases it is an error if procedure arguments are not of the proper form. For example, it is an error if the argument to make-pmatcher is not a pattern, if the first argument to a pmatcher is not a fragment, or if the procedure associated with an is pattern produces an error when applied to an object in the fragment. Your can assume that make-pmatcher and the pmatchers it returns are given valid arguments by the test program; however, you must insure that the pmatchers do not cause any errors when given valid arguments.

To turn in your assignment, submit a file pr1.ss that contains a definition of make-pmatcher along with any auxiliary definitions that make-pmatcher needs. The first line of pr1.ss should be a comment containing your name and student ID.

Examples

The following examples show how make-pmatcher might be used in a test program. Please do not submit them as part of pr1.ss.

(define expr 'obj)
(define exprs '(* (list obj)))
(define star-obj '(* obj))
(define unary-fn `(append (list 'lambda (list symbol)) ,exprs))
(define binary-fn `(append (list 'lambda (list symbol symbol)) ,exprs))
(define unary-or-binary-fn `(or ,unary-fn ,binary-fn))
(define nary-fn `(append (list 'lambda (* (list symbol))) ,exprs))
(define nary-fn-defn `(list 'define symbol ,nary-fn))

; For each pattern defined above, use make-pmatcher to create a
; pmatcher that matches the pattern.
(define pmatch-expr (make-pmatcher expr))
(define pmatch-exprs (make-pmatcher exprs))
(define pmatch-star-obj (make-pmatcher star-obj))
(define pmatch-unary-fn (make-pmatcher unary-fn))
(define pmatch-binary-fn (make-pmatcher binary-fn))
(define pmatch-unary-or-binary-fn (make-pmatcher unary-or-binary-fn))
(define pmatch-nary-fn (make-pmatcher nary-fn))
(define pmatch-nary-fn-defn (make-pmatcher nary-fn-defn))

; Test the pmatchers.
(pmatch-expr 0)  ===>  true
(pmatch-exprs 0)  ===>  #f
(pmatch-exprs '(0))  ===>  true
(pmatch-star-obj 0)  ===>  true
(pmatch-unary-fn '(lambda () #f))  ===>  #f
(pmatch-unary-fn '(lambda () #f))  ===>  #f
(pmatch-unary-fn '(lambda (x) x))  ===>  true
(pmatch-unary-fn '(lambda (x y) x))  ===>  #f
(pmatch-binary-fn '(lambda () #f))  ===>  #f
(pmatch-binary-fn '(lambda (x) x))  ===>  #f
(pmatch-binary-fn '(lambda (x y) x))  ===>  true
(pmatch-binary-fn '(lambda (x y z) x))  ===>  #f
(pmatch-unary-or-binary-fn '(lambda () #f))  ===>  #f
(pmatch-unary-or-binary-fn '(lambda (x) x))  ===>  true
(pmatch-unary-or-binary-fn '(lambda (x y) x))  ===>  true
(pmatch-unary-or-binary-fn '(lambda (x y z) x))  ===>  #f
(pmatch-nary-fn '(lambda () #f))  ===>  true
(pmatch-nary-fn '(lambda (x) x))  ===>  true
(pmatch-nary-fn '(lambda (x y) x))  ===>  true
(pmatch-nary-fn '(lambda (x y z) x))  ===>  true
(pmatch-nary-fn-defn '(lambda (x y z) x))  ===>  #f
(pmatch-nary-fn-defn '(define foo (lambda (x y z) x)))  ===>  true


; Some more examples.

(define allowed-procedures
  '(or '< '<= '= '> '>= '- '+ 'boolean?
       'car 'char? 'cons 'cdr 'eq? 'equal? 'eqv?
       'list? 'not 'null? 'number?
       'pair? 'string? 'symbol? 'vector?))

(define other-allowed-special-forms
  '(or 'and 'cond 'if 'let 'letrec 'or 'quote))

(define lambda-args
  '(or symbol
       (* (list symbol))
       (append (* (list symbol)) symbol)))

(define lambda-expr `(append (list 'lambda ,lambda-args) ,exprs))
(define quote-expr '(list 'quote obj))
(define call-expr `(cons (or ,lambda-expr ,allowed-procedures) ,exprs))

(define allowed-primitives
  `(or number symbol ,call-expr (cons ,other-allowed-special-forms ,exprs)))

(define pmatch-*-boolean (make-pmatcher '(* boolean)))
(define pmatch-*-list-boolean (make-pmatcher '(* (list boolean))))
(define pmatch-nil-a (make-pmatcher '(append '() 'a)))
(define pmatch-a-a (make-pmatcher '(append 'a 'a)))
(define pmatch-star-a (make-pmatcher '(* 'a)))
(define pmatch-star-list-a (make-pmatcher '(* (list 'a))))
(define pmatch-list (make-pmatcher 'list))
(define pmatch-list-list (make-pmatcher '(list list)))
(define pmatch-allowed-procedures (make-pmatcher allowed-procedures))
(define pmatch-other-allowed-special-forms
  (make-pmatcher other-allowed-special-forms))
(define pmatch-lambda-args (make-pmatcher lambda-args))
(define pmatch-lambda-expr (make-pmatcher lambda-expr))
(define pmatch-quote-expr (make-pmatcher quote-expr))
(define pmatch-call-expr (make-pmatcher call-expr))
(define pmatch-allowed-primitives (make-pmatcher allowed-primitives))

(pmatch-*-boolean #t)  ===>  true
(pmatch-*-boolean #f)  ===>  true
(pmatch-*-boolean '())  ===>  true
(pmatch-*-boolean '(#f))  ===>  #f
(pmatch-*-list-boolean #t)  ===>  #f
(pmatch-*-list-boolean #f)  ===>  #f
(pmatch-*-list-boolean '())  ===>  true
(pmatch-*-list-boolean '(#f))  ===>  true
(pmatch-*-list-boolean '(#f #t))  ===>  true
(pmatch-*-list-boolean '(#f (#t)))  ===>  #f
(pmatch-nil-a 'a)  ===>  true
(pmatch-nil-a '())  ===>  #f
(pmatch-nil-a '(() . a))  ===>  #f
(pmatch-nil-a '(a . a))  ===>  #f
(pmatch-a-a 'a)  ===>  #f
(pmatch-a-a '())  ===>  #f
(pmatch-a-a '(() . a))  ===>  #f
(pmatch-a-a '(a . a))  ===>  #f
(pmatch-star-a 'a)  ===>  true
(pmatch-star-a '())  ===>  true
(pmatch-star-a '(a))  ===>  #f
(pmatch-star-a '(a a))  ===>  #f
(pmatch-star-list-a 'a)  ===>  #f
(pmatch-star-list-a '())  ===>  true
(pmatch-star-list-a '(a))  ===>  true
(pmatch-star-list-a '(a a))  ===>  true
(pmatch-list '())  ===>  true
(pmatch-list 'a)  ===>  #f
(pmatch-list '(a))  ===>  true
(pmatch-list '(a a))  ===>  true
(pmatch-list '((a) a))  ===>  true
(pmatch-list '((a) (a)))  ===>  true
(pmatch-list-list '())  ===>  #f
(pmatch-list-list '(()))  ===>  true
(pmatch-list-list 'a)  ===>  #f
(pmatch-list-list '(a))  ===>  #f
(pmatch-list-list '(a a))  ===>  #f
(pmatch-list-list '((a) a))  ===>  #f
(pmatch-list-list '((a) (a)))  ===>  #f
(pmatch-list-list '((a)))  ===>  true
(pmatch-allowed-procedures 'or)  ===>  #f
(pmatch-allowed-procedures 'number?)  ===>  true
(pmatch-allowed-procedures 'modulo)  ===>  #f
(pmatch-other-allowed-special-forms 'or)  ===>  true
(pmatch-other-allowed-special-forms 'number?)  ===>  #f
(pmatch-other-allowed-special-forms '+)  ===>  #f
(pmatch-lambda-args '())  ===>  true
(pmatch-lambda-args '(x))  ===>  true
(pmatch-lambda-args '(x y))  ===>  true
(pmatch-lambda-args '(x y . z))  ===>  true
(pmatch-lambda-args 'z)  ===>  true
(pmatch-lambda-args 0)  ===>  #f
(pmatch-lambda-args '(0))  ===>  #f
(pmatch-lambda-args '(()))  ===>  #f
(pmatch-lambda-expr '(lambda x x))  ===>  true
(pmatch-lambda-expr '(lambda (x . y) y))  ===>  true
(pmatch-lambda-expr '(lambda x (x) (x)))  ===>  true
(pmatch-lambda-expr '(lambda ((x) (x))))  ===>  #f
(pmatch-quote-expr 'x)  ===>  #f
(pmatch-quote-expr ''x)  ===>  true
(pmatch-quote-expr '''x)  ===>  true
(pmatch-quote-expr '())  ===>  #f
(pmatch-call-expr '())  ===>  #f
(pmatch-call-expr '(modulo))  ===>  #f
(pmatch-call-expr '(cons))  ===>  true
(pmatch-call-expr 'cons)  ===>  #f
(pmatch-call-expr '((lambda (x) x) 1 2))  ===>  true
(pmatch-call-expr '(lambda (x) x))  ===>  #f
(pmatch-allowed-primitives 12)  ===>  true
(pmatch-allowed-primitives 'a)  ===>  true
(pmatch-allowed-primitives '(modulo 3 4))  ===>  #f
(pmatch-allowed-primitives '(lambda (x) x))  ===>  #f
(pmatch-allowed-primitives '(if (null? x) (cons x y)))  ===>  true

Hint

See the DNA fragment analyzer problem that was assigned in a previous quarter, along with the solution at the end of the problem.


© 2003 Paul Eggert. See copying rules.
$Id: pr1.html,v 1.15 2003/10/22 06:02:57 eggert Exp $