You solved Homework 2, but your professor is unhappy with the solution. It works, but it's so complicated that the professor's other assistants are having trouble coming up to speed with it, when they need to modify it. Your professor suggests that you code it up in Prolog instead, and see whether that version is easier to understand.
+T
T
is an atom, i.e., it satisfies the
atom/1
predicate) matches the terminal
symbol that has the value T
.N
N
is an atom) matches the nonterminal symbol that has the
value n
.seq([E1,
..., En])
E1
, ..., En
, left to right. If n
= 0, it
matches only the empty string.choose([E1,
..., En])
E1
, ..., En
. If n
= 0, this does not match anything (not even the empty string). This is prioritized choice: alternatives are tested in order, and the first successful match is used unconditionally, with later alternatives excluded. For example, choose([+a, seq([+a, +b])])
is equivalent to +a
because the second alternative is excluded no matter what.star(E)
E
. This expression is greedy, that is, it always matches as many E
s as possible, and it refuses to match fewer than the maximum possible number of instances. This expression never fails to match, since it can always match the empty string if all else fails.not(E)
E
. If the current input matches E
, not(E)
does not match anything.seq([])
instead.
T
T
, which was matched with +T
.node(N, T)
N
,
which expanded into the parse tree T
. This
parse tree was matched with N
.sequence([T1,
..., Tn])
T1
, ..., Tn
, left to right. This tree was matched with sequence([E1,
..., En])
, where each Ti
was matched with Ei
.repeat([T1,
..., Tn])
T1
, ..., Tn
, left to right. This tree was matched with star(E)
, where each Ti
was matched with E
.is_not(E)
not(E)
.There is no special syntax for the parse tree when input is matched by choose([E1,
..., En])
; instead, the parse tree should simply be that of the chosen alternative.
Write a general
parser peg_parse(Exp, Tree, LM, M)
that takes the parsing expression Exp and matches it against a prefix
of the list LM of terminal symbols. If it parses successfully, Tree is
the corresponding parse tree and M is the unmatched suffix of LM.
Your parser should assume that the grammar is defined by a
predicate rule/2
whose first argument is a nonterminal
and whose second argument is the parsing expression that
the nonterminal stands for.
Also, write two good test cases for your peg_parse
function. These test cases should all be in the style of the test
cases given below. Your test cases should be named test1
and test2
. The first test case
should be based on the sample shell grammar shown below, except that
your grammar should also support if-commands, including the
tokens if
, then
, else
, elif
,
and fi
. The second test case should use a PEG of your
own. You can use the same test cases that you used to answer
Homework 2, translated into Prolog.
As with Homeworks 1 and 2, your code should be free of side effects; for example, it should not use I/O. Your code may use cuts (!/0), however. Simplicity is more important than efficiency, but your code should avoid using unnecessary time and space when it is easy to do so.
Assess your work by writing an after-action report that summarizes the strengths and weaknesses of your solution compared to a good solution for Homework 2. This report should be a simple ASCII plain text file that consumes a page or so (at most 100 lines and 80 columns per line, please). See Resources for oral presentations and written reports for advice on how to write assessments; admittedly much of the advice there is overkill for the simple kind of report we're looking for here.
We will test your program on the SEASnet Linux servers as before,
so make sure that
/usr/local/cs/bin
is at the start of your path,
using the same technique as in Homework 1.
Submit three files via CourseWeb. The file peg.prolog
should define peg_parse
along with any auxiliary
functions needed to define peg_parse
. The file
peg-test.prolog
should contain your test cases. The file
peg.txt
should hold your assessment. Please do not put
your name, student ID, or other personally identifying information in
your files.
rule(complete_command,
seq([pipe_sequence, choose([separator, empty])])).
rule(pipe_sequence,
seq([simple_command, star(seq([+'|', simple_command]))])).
rule(simple_command,
seq([cmd_name, star(choose([+word, io_redirect]))])).
rule(cmd_name,
seq([not(function_start), +word])).
rule(io_redirect,
seq([choose([+(<), +(>), +(>>), +(<>), +('>|')]), filename])).
rule(filename, +word).
rule(newline_list, seq([+'\n', linebreak])).
rule(linebreak, star(+'\n')).
rule(separator, choose([seq([separator_op, linebreak]), newline_list])).
rule(separator_op, choose([+('&'), +(;)])).
rule(function_start, seq([+word, +('(')])).
rule(empty, seq([])).
false_test(cc1, [>, word]).
true_test(cc2, [word, >], [>],
node(complete_command,
sequence(
[node(pipe_sequence,
sequence(
[node(simple_command,
sequence(
[node(cmd_name,
sequence(
[is_not(function_start),word])),
repeat([])])),
repeat([])])),
node(empty,sequence([]))]))).
true_test(cc3, [word, word, word, <, word, <>, word, '\n', '\n', '\n'], [],
node(complete_command,
sequence(
[node(pipe_sequence,
sequence(
[node(simple_command,
sequence(
[node(cmd_name,
sequence(
[is_not(function_start),word])),
repeat(
[word,word,
node(io_redirect, sequence([<,node(filename,word)])),
node(io_redirect, sequence([<>,node(filename,word)]))])])),
repeat([])])),
node(separator,
node(newline_list,
sequence(['\n', node(linebreak, repeat(['\n','\n']))])))]))).
true_test(cc4, [word, <, word, '|', word, >, word, bleagh], [bleagh],
node(complete_command,
sequence(
[node(pipe_sequence,
sequence(
[node(simple_command,
sequence(
[node(cmd_name,
sequence([is_not(function_start),word])),
repeat(
[node(io_redirect,
sequence([<, node(filename, word)]))])])),
repeat(
[sequence(
['|',
node(simple_command,
sequence(
[node(cmd_name,
sequence([is_not(function_start), word])),
repeat(
[node(io_redirect,
sequence([>,node(filename,word)]))])]))])])])),
node(empty,sequence([]))]))).
test_parse(LM, M, T) :- peg_parse(complete_command, T, LM, M).
test_failed(Name, peg_parse_should_have_failed,
input=LM, remaining=M, tree=T) :-
false_test(Name, LM),
test_parse(LM, M, T).
test_failed(Name, peg_parse_should_have_succeeded,
input=LM, remaining=MvsM1, tree=TvsT1) :-
true_test(Name, LM, M, T),
test_parse(LM, M1, T1),
(M \= M1; T \= T1),
(M = M1 -> MvsM1 = ok ; MvsM1 = (M \= M1)),
(T = T1 -> TvsT1 = ok ; TvsT1 = (T \= T1)).
If you put the sample test cases into a file
peg-sample.prolog
, you should be able to use it as
follows to test your peg.prolog
solution on the SEASnet
implementation of GNU Prolog. Similarly, the top-level
command ['peg-test.prolog'], test_failed(Name, Type, Input,
Remaining, Tree)
should run your own test cases on your
solution.
$ gprolog
GNU Prolog 1.3.1
By Daniel Diaz
Copyright (C) 1999-2009 Daniel Diaz
| ?- ['peg.prolog'].
compiling .../peg.prolog for byte code...
.../peg.prolog compiled, ...
yes
| ?- ['peg-sample.prolog'].
compiling .../peg-sample.prolog for byte code...
.../peg-sample.prolog compiled, ...
yes
| ?- test_failed(A,B,C,D,E).
no
Ford B. Parsing expression grammars: a recognition-based syntactic foundation [PDF]. Proc 31st ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL), 2004, 111–22.