Homework 4. Naive parsing of context free grammars again

Motivation

Let's try doing Homework 2 over again, this time in a different language, namely Prolog. Maybe we'll have better luck this time.

This time we won't bother to generate a matcher; we'll just have a general predicate that does matching directly. Instead of asking for a derivation, we'll just ask the grammar to be annotated to tell us what to generate; real parsers do this far more often than they generate derivations, so this is more realistic. Also, instead of passing an acceptor explicitly to the matcher, we'll make it the caller's responsibility to decide whether to accept the match. If the caller doesn't like the match, the caller can backtrack into the matcher to get a different solution. Prolog has backtracking built-in, so this is easy to implement and explain.

Assignment

Write a predicate parse_prefix/3 that accepts the following arguments:

NT, an attributed nonterminal
RS, a list of tokens
S, the unmatched suffix of RS

parse_prefix(NT,RS,S) should succeed if an initial prefix of RS can be parsed according to the grammar as an instance of NT, such that S is the remaining suffix of RS after the initial prefix is removed. parse_prefix/3 may need to backtrack if there are multiple answers. It can assume that its second argument is a ground term, that is, that its second argument contains no logical variables.

As part of the test case, you will be given a predicate rule/2 that describes the grammar. rule(NT,RHS) succeeds if there is a grammar rule whose left hand side is the attributed nonterminal NT and whose right hand side is RHS. RHS is a list of marked symbols. The arguments to rule/2 need not be ground terms.

A marked symbol is a Prolog term of the form -T, which represents the token T, or of the form +NT, which represents the attributed nonterminal NT.

An attributed nonterminal is a Prolog structure whose function symbol is the nonterminal's name and whose arguments are the attributes. In our test case the attributes are parse trees, but in general they can be any Prolog term.

Here is an example grammar that you can use for testing. You might want to put it into a file named awkgram.prolog. It is the same as Homework 2's awksub_grammar.

rule(expr(op(T,B,E)), [+term(T), +binop(B), +expr(E)]).
rule(expr(T), [+term(T)]).

rule(term(N), [+num(N)]).
rule(term(L), [+lvalue(L)]).
rule(term(pre(O,L)), [+incrop(O), +lvalue(L)]).
rule(term(post(L,O)), [+lvalue(L), +incrop(O)]).
rule(term(E), [-('('), +expr(E), -(')')]).

rule(lvalue($(E)), [-($), +expr(E)]).

rule(incrop(++), [-(++)]).
rule(incrop(--), [-(--)]).

rule(binop(+), [-(+)]).
rule(binop(-), [-(-)]).

rule(num(0), [-(0)]).
rule(num(1), [-(1)]).
rule(num(2), [-(2)]).
rule(num(3), [-(3)]).
rule(num(4), [-(4)]).
rule(num(5), [-(5)]).
rule(num(6), [-(6)]).
rule(num(7), [-(7)]).
rule(num(8), [-(8)]).
rule(num(9), [-(9)]).

Your implementation of parse_prefix/3 should not use any side effects or cuts, and should avoid using built-in Prolog predicates like member/2 and append/3 as much as possible.

Submit

Submit two files. First, submit a file hw4.prolog containing your predicate definitions. Make sure that your definitions work with gprolog, the Prolog implementation installed on SEASnet in /usr/local/cs/bin; if you type the shell command gprolog --version the first line of output should be Prolog top-Level (GNU Prolog) 1.3.0. Second, submit an ASCII text file hw4.txt that briefly describes how you arrived at your solution, focusing on alternative approaches that you rejected and why. Also, describe what happens when your implementation of parse_prefix/3 is called in an attempt to implement Homework 1, and why. For example, what happens when you call parse_prefix(expr(op(3,(+),$(4))), RS, S) or even parse_prefix(A,B,C)? Please do not put your name, student ID, or other personally identifying information in your files.

Examples

In all of these examples but the last one, the programmer repeatedly typed ";" in order to backtrack through all possible solutions. (The last example has too many solutions so only the first few are printed.) Your program does not need to return the answers in the same order as shown below. Also, your program's final response need not agree with the yes and ; no responses in the examples shown below.

$ eggert@day:~/junk$ gprolog
GNU Prolog 1.3.0
By Daniel Diaz
Copyright (C) 1999-2007 Daniel Diaz
| ?- consult('awkgram.prolog').
compiling awkgram.prolog for byte code…

yes
| ?- consult('hw4.prolog').
compiling hw4.prolog for byte code…

yes
| ?- parse_prefix(expr(E), [ouch], S).

no
| ?- parse_prefix(expr(E), [9], S).

E = 9
S = [] ? ;

no
| ?- parse_prefix(expr(E), [9,+,$,1,+], S).

E = op(9,+,$(1))
S = [+] ? ;

E = 9
S = [+,$,1,+] ? ;

no
| ?- parse_prefix(expr(E), [9,+,$,1,+], []).

no
| ?- parse_prefix(expr(E), [$,1,+,2], S).

E = op($(1),+,2)
S = [] ? ;

E = $(op(1,+,2))
S = [] ? ;

E = $(1)
S = [+,2] ? ;

no
| ?- parse_prefix(expr(E), [$,3,++,4], [S1|Srest]).

E = $(3)
S1 = ++
Srest = [4] ? ;

E = post($(3),++)
S1 = 4
Srest = [] ? ;

no
| ?- parse_prefix(expr(E), [$,3,++,4], [S1|Srest]), integer(S1).

E = post($(3),++)
S1 = 4
Srest = [] ? ;

no
| ?- parse_prefix(
     expr(E),
     ['(', $, 8, ')', -, $, ++, $, --, $, 9, +,
      '(', $, ++, $, 2, +, '(', 8, ')', -, 9, ')',
      -, '(', $, $, $, $, $, ++, $, $, 5, ++,
      ++, --, ')', -, ++, $, $, '(', $, 8, ++, ')',
      ++, +, 0],
     S).

E = op($(8),-,op($(op(pre(++,$(op(pre(--,$(op(9,+,op($(op(pre(++,$(2)),+,8)),-,9)))),-,$($($(post($(post($(pre(++,$(post($(5),++)))),++)),--))))))),-,pre(++,$(post($(post($(8),++)),++))))),+,0))
S = [] ? ;

E = op($(8),-,op($(op(pre(++,$(op(pre(--,$(op(9,+,op($(op(pre(++,$(2)),+,8)),-,9)))),-,$($(post($($(post($(pre(++,$(post($(5),++)))),++))),--)))))),-,pre(++,$(post($(post($(8),++)),++))))),+,0))
S = [] ? ;

E = op($(8),-,op($(op(pre(++,$(op(pre(--,$(op(9,+,op($(op(pre(++,$(2)),+,8)),-,9)))),-,$($(post($(post($($(pre(++,$(post($(5),++))))),++)),--)))))),-,pre(++,$(post($(post($(8),++)),++))))),+,0))
S = [] ? ;

E = op($(8),-,op($(op(pre(++,$(op(pre(--,$(op(9,+,op($(op(pre(++,$(2)),+,8)),-,9)))),-,$($(post($(post($(post($(pre(++,$($(5)))),++)),++)),--)))))),-,pre(++,$(post($(post($(8),++)),++))))),+,0))
S = [] ? At this point the user typed Return instead of ';'.

yes
| ?-