Your company NanoBlat has bought another company EagerGenomics that has a suite of DNA and RNA fragment analyzers written in Prolog. One of the problems that EagerGenomics analyzes is that of the edit distance between two DNA fragments. In computer science, the edit distance between two strings is the number of insertions, deletions, or substitutions of a single character needed to transform one string into another. The situation with DNA is more complicated, and your job is to write a simple prototype that will construct sample edit-distance problems and check the results of EagerGenomics's code.
You decide first to prototype a simpler edit-distance problem than either the classic computer science problem or the DNA problem. In this prototype, you merely worry about insertions and deletions, not substitutions. Your prototype is not supposed to be specialized to DNA; it should work with RNA, protein sequences, and with other genomic variants.
Write a Prolog predicate edit_diff(Input,Commands,Output) that succeeds if the list Output is the result of starting with the list Input and then applying the edit script Commands. Input and Output can each be any list of terms, but Commands must be a list of commands. The commands must completely exhaust the input list, and must generate the entire output list. The commands are:
Also, write a Prolog predicate edit_diff_sf/3 that behaves like edit_diff/3, except that if there are two Commands of differing lengths that solve the problem, edit_diff_sf/3 yields the shorter one first. The idea is that edit_diff_sf/3 need not be as efficient as edit_diff/3.
Your implementations of edit_diff/3 and edit_diff_sf/3 should not use any side effects or cuts, and should avoid using builtin Prolog predicates like append/3 as much as possible.
edit_diff/3 and edit_diff_sf/3 may need to backtrack if there are multiple answers.
Submit two files. First, submit a file hw4.pl containing your predicate definitions. Make sure that your definitions work with gprolog, the Prolog implementation installed on SEASnet. Second, submit an ASCII text file hw4.txt that briefly describes how you arrived at your solution, focusing on alternative approaches that you rejected and why. Please do not put your name, student ID, or other personally identifying information in your files.
In these examples, the programmer repeatedly typed ";" in order to backtrack through all possible solutions. Your program does not need to return the answers in the same order as shown below, but that your implementation of edit_diff_sf/3 should return equal-length Commands answers in the same order that edit_diff/3 does. Also, your program's final response need not agree with the yes and ; no responses in the examples shown below.
$ gprolog GNU Prolog 1.3.0 By Daniel Diaz Copyright (C) 1999-2007 Daniel Diaz | ?- consult('hw4.pl'). compiling hw4.pl for byte code … yes | ?- edit_diff([a], [insert(b),copy], R). R = [b,a] yes | ?- edit_diff([a], [copy, insert(b)], R). R = [a,b] yes | ?- edit_diff([a], [copy], R). R = [a] yes | ?- edit_diff([a], [delete(a)], R). R = [] yes | ?- edit_diff([a], [delete(b)], R). no | ?- edit_diff([a], [insert(b)], R). no | ?- edit_diff(Q, [insert(a)], [a]). Q = [] yes | ?- edit_diff(Q, [delete(a)], []). Q = [a] yes | ?- edit_diff(Q, [copy, delete(a)], [b]). Q = [b,a] yes | ?- edit_diff(Q, [copy, delete(X)], [b]). Q = [b,X] yes | ?- edit_diff(Q, [insert(X), delete(Y)], [b]). Q = [Y] X = b yes | ?- edit_diff([X,a], [copy, delete(X)], R). R = [a] X = a yes | ?- edit_diff([a,b,c], [C1,C2,C3], [X,Y]). C1 = copy C2 = copy C3 = delete(c) X = a Y = b ? ; C1 = copy C2 = delete(b) C3 = copy X = a Y = c ? ; C1 = delete(a) C2 = copy C3 = copy X = b Y = c ? ; no | ?- edit_diff([a,b,c], [C1,C2,C3], [U,V,W,X]). no | ?- edit_diff([a,b], Commands, [b,c]). Commands = [insert(b),insert(c),delete(a),delete(b)] ? ; Commands = [insert(b),delete(a),insert(c),delete(b)] ? ; Commands = [insert(b),delete(a),delete(b),insert(c)] ? ; Commands = [delete(a),copy,insert(c)] ? ; Commands = [delete(a),insert(b),insert(c),delete(b)] ? ; Commands = [delete(a),insert(b),delete(b),insert(c)] ? ; Commands = [delete(a),delete(b),insert(b),insert(c)] ? ; no | ?- edit_diff([g,i,g], C, [i,g]), write(C), nl, fail. [insert(i),copy,delete(i),delete(g)] [insert(i),insert(g),delete(g),delete(i),delete(g)] [insert(i),delete(g),insert(g),delete(i),delete(g)] [insert(i),delete(g),delete(i),copy] [insert(i),delete(g),delete(i),insert(g),delete(g)] [insert(i),delete(g),delete(i),delete(g),insert(g)] [delete(g),copy,copy] [delete(g),copy,insert(g),delete(g)] [delete(g),copy,delete(g),insert(g)] [delete(g),insert(i),insert(g),delete(i),delete(g)] [delete(g),insert(i),delete(i),copy] [delete(g),insert(i),delete(i),insert(g),delete(g)] [delete(g),insert(i),delete(i),delete(g),insert(g)] [delete(g),delete(i),insert(i),copy] [delete(g),delete(i),insert(i),insert(g),delete(g)] [delete(g),delete(i),insert(i),delete(g),insert(g)] [delete(g),delete(i),delete(g),insert(i),insert(g)] no | ?- edit_diff_sf([g,i,g], C, [i,g]), write(C), nl, fail. [delete(g),copy,copy] [insert(i),copy,delete(i),delete(g)] [insert(i),delete(g),delete(i),copy] [delete(g),copy,insert(g),delete(g)] [delete(g),copy,delete(g),insert(g)] [delete(g),insert(i),delete(i),copy] [delete(g),delete(i),insert(i),copy] [insert(i),insert(g),delete(g),delete(i),delete(g)] [insert(i),delete(g),insert(g),delete(i),delete(g)] [insert(i),delete(g),delete(i),insert(g),delete(g)] [insert(i),delete(g),delete(i),delete(g),insert(g)] [delete(g),insert(i),insert(g),delete(i),delete(g)] [delete(g),insert(i),delete(i),insert(g),delete(g)] [delete(g),insert(i),delete(i),delete(g),insert(g)] [delete(g),delete(i),insert(i),insert(g),delete(g)] [delete(g),delete(i),insert(i),delete(g),insert(g)] [delete(g),delete(i),delete(g),insert(i),insert(g)] no | ?- findall(Commands, edit_diff([a,b,c,d],Commands,[b,a,d]), L), length(L, N), write(N), nl, fail. 58 no | ?- findall(Commands, edit_diff_sf([a,b,c,d],Commands,[b,a,d]), L), length(L, N), write(N), nl, fail. 58 no