Homework 1. Scheme code difference analyzer

The problem

Tortuous Data Processing Inc. (TDP) is suing your employer Jerry-built Computer Networks Corp. (JCN), and is claiming that JCN stole large bodies of TDP's code and incorporated it into JCN's operating system. As part of the legal discovery process, JCN has obtained copies of TDP's operating system source code, and wants to compare it to TDP's to see whether the suit's claims have any merit. About 5% of both operating systems are written in Scheme. Your team has been assigned the job of comparing the Scheme parts.

Your boss suggests that you start by writing a procedure compare-code that compares two Scheme expressions x and y, and produces an annotated difference summary of where the two inputs are the same and where they differ. This low-level procedure will be used often as part of a higher-level process written by other members of your team, so it needs to be relatively efficient.

After consulting with other members of your team you decide that the difference summary should take the same form as the inputs where they are the same, and should be of the form (<> x y) where they differ. More precisely, (compare-code x y) by default should return (<> x y), except that the following rules should be applied recursively to simplify the output:

You examine all the Scheme code to be compared, and discover that it has no occurrences of <>, so you do not have to worry about it appearing in the input arguments. Also, you and your team decide not to worry about applying the rule recursively to non-pairs (e.g., vectors), since they occur so rarely in the code; you need to worry only about recursing through pairs and lists.

Assignment

Write a Scheme procedure (compare-code x y) that implements the specification described above. Your implementation should work on any pair of objects that do not contain <> anywhere.

Your implementation must be free of side effects; for example you cannot use the set! procedure. Also, your implementation should avoid using unnecessary storage: that is, it should attempt to minimize the number of new cons cells in the value that it returns. Returned values may share storage with arguments; they need not copy their arguments. Speed efficiency is also desirable, as your implementation will be invoked many times.

To turn in your assignment, submit a file hw1.ss containing the definition of compare-code along with any other auxiliary definitions needed to run it. The first line of hw1.ss should be a comment containing your name and student ID. Make sure that your definitions work with mzscheme, the Scheme implementation installed on SEASnet.

Examples

(compare-code '() '())  ===>  ()
(compare-code '() #f)  ===>  (<> () #f)
(compare-code 'a '(cons a b))  ===>  (<> a (cons a b))
(compare-code '(cons a b) '(cons a b))  ===>  (cons a b)
(compare-code '(cons a b) '(cons a c))  ===>  (cons a (<> b c))
(compare-code '(cons a b) '(list a b))  ===>  ((<> cons list) a b)
(compare-code '(car a) '(cons a b))  ===>  ((<> car cons) a . (<> () (b)))

; See discussion below for more about this example.
(compare-code '(cons a b) '(list b a c))
===>
  ((<> cons list) (<> a b) (<> b a) . (<> () (c)))

(define TDP-match-*-defn
  '(define match-*
     (lambda (matcher frag accept)
       (or (accept frag)
           (matcher frag
                    (lambda (frag1)
                      (and (not (eq? frag frag1))
                           (match-* matcher frag1 accept))))))))
(define JCN-match-*-defn
  '(define match-*
     (lambda (matcher frag acc)
       (or (acc frag)
           (matcher frag
                    (lambda (frag1)
                      (and (not (eq? frag frag1))
                           (match-* matcher frag1 acc))))))))
(define match-*-comparison
  (compare-code TDP-match-*-defn JCN-match-*-defn))

match-*-comparison
===>
  (define match-*
    (lambda (matcher frag (<> accept acc))
      (or ((<> accept acc) frag)
	  (matcher frag
		   (lambda (frag1)
		     (and (not (eq? frag frag1))
			  (match-* matcher frag1 (<> accept acc))))))))

(define cdadaddaddr
  (lambda (x)
    (cdr (car (cdr (car (cdr (cdr (car (cdr (cdr x)))))))))))

(cdadaddaddr match-*-comparison)  ===>  (frag)
(cdadaddaddr TDP-match-*-defn)  ===>  (frag)
(cdadaddaddr JCN-match-*-defn)  ===>  (frag)

(eq? match-*-comparison TDP-match-*-defn)  ===>  #f
(eq? (cdadaddaddr match-*-comparison)
     (cdadaddaddr TDP-match-*-defn))  ===>  #t
(eq? (cdadaddaddr match-*-comparison)
     (cdadaddaddr JCN-match-*-defn))  ===>  unspecified; either #t or #f

Discussion of example

The example call (compare-code '(cons a b) '(list b a c)) is equivalent to:

(compare-code
   '(cons a b . ())
   '(list b a . (c))
   )

If you line up the differing components, you get the result shown above. The result can also be derived from the rules (which start with "More precisely" above). According to these rules this example should return (<> (cons a b) (list b a c)), except that the rules rewrite this answer according to the following steps:

(<> (cons a b) (list b a c))
((<> cons list) . (<> (a b) (b a c)))
((<> cons list) . ((<> a b) . (<> (b) (a c))))
((<> cons list) . ((<> a b) . ((<> b a) . (<> () (c)))))

and the last term is equivalent to:

((<> cons list) (<> a b) (<> b a) . (<> () (c)))

which the Scheme printer will display as follows:

((<> cons list) (<> a b) (<> b a) <> () (c))

A solution

(define compare-code
  (lambda (x y)
    (cond
     ((eqv? x y) x)
     ((and (pair? x) (pair? y))
      (let ((a (compare-code (car x) (car y)))
	    (d (compare-code (cdr x) (cdr y))))
	(if (and (eq? a (car x))
		 (eq? d (cdr x)))
	    x
	    (cons a d))))
     (#t (list '<> x y)))))

Test cases used for grading

20 test cases were used. Each was worth 5 points; they were individually all or nothing. The last test case was an efficiency test: quadratic implementations (for example, those that called (equal? x y) at each level of recursion) got 0 points.

call expected result
(compare-code '() '()) ()
(compare-code '() '#f) (<> () #f)
(compare-code 'a '(cons a b)) (<> a (cons a b))
(compare-code '(cons a b) '(cons a b)) (cons a b)
(compare-code '(cons a b) '(cons a c)) (cons a (<> b c))
(compare-code '(cons a b) '(list a b)) ((<> cons list) a b)
(compare-code '(car a) '(cons a b)) ((<> car cons) a <> () (b))
(compare-code '(cons a b) '(list b a c)) ((<> cons list) (<> a b) (<> b a) <> () (c))
(cdadaddaddr match-*-comparison) (frag)
(cdadaddaddr TDP-match-*-defn) (frag)
(cdadaddaddr JCN-match-*-defn) (frag)
(eq? match-*-comparison TDP-match-*-defn) #f
(eq? (cdadaddaddr match-*-comparison) (cdadaddaddr TDP-match-*-defn)) #t
(eq? (cdadaddaddr match-*-comparison) (cdadaddaddr JCN-match-*-defn)) #f
(compare-code '#t '#f) (<> #t #f)
(compare-code '(a . b) '(a . b)) (a . b)
(compare-code '(a . b) '(a)) (a <> b ())
(compare-code '(cons a b . ()) '(list b a . (c))) ((<> cons list) (<> a b) (<> b a) <> () (c))
(compare-code '(car (a (b c) . x)) '(list (a (d c) . y))) ((<> car list) (a ((<> b d) c) <> x y))
(bigtest 10000) answer omitted

© 2003 Paul Eggert. See copying rules.
$Id: hw1.html,v 1.11 2003/10/24 07:24:51 eggert Exp $