Homework 4. Naive parsing of context free grammars again

Motivation

Let's try doing Homework 2 over again, this time in a different language, namely Prolog. Maybe we'll have better luck this time.

This time we won't bother to generate a matcher; we'll just have a general predicate that does matching directly. Instead of asking for a derivation, we'll just ask the grammar to be annotated to tell us what to generate; real parsers do this far more often than they generate derivations, so this is more realistic. Also, instead of passing an acceptor explicitly to the matcher, we'll make it the caller's responsibility to decide whether to accept the match. If the caller doesn't like the match, the caller can backtrack into the matcher to get a different solution. Prolog has backtracking built-in, so this is easy to implement and explain.

Assignment

Write a predicate parse_prefix/3 that accepts the following arguments:

NT, an attributed nonterminal
RS, a list of tokens
S, the unmatched suffix of RS

parse_prefix(NT,RS,S) should succeed if an initial prefix of RS can be parsed according to the grammar as an instance of NT, such that S is the remaining suffix of RS after the initial prefix is removed. parse_prefix/3 may need to backtrack if there are multiple answers. It can assume that RS is a ground term, that is, that RS contains no logical variables.

As part of the test case, you will be given a predicate =>/2 that describes the grammar. NT=>RHS succeeds if there is a grammar rule whose left hand side is the attributed nonterminal NT and whose right hand side is RHS. RHS is a list of symbols. The arguments to =>/2 need not be ground terms. Each rule in the grammar is considered separately with respect to the logical variables in the grammar; that is, if two different rules both mention a logical variable X, there are two different variables both named "X".

You will need to define the '=>' operator; it is not built in.

A symbol is either an atomic term A, which represents the token A, or a compound term C, which represents the attributed nonterminal C. An atomic term is either an atom, an integer, or a floating point number; a compound term always has positive arity.

An attributed nonterminal is a Prolog structure whose function symbol is the nonterminal's name and whose arguments are the attributes. In our test case the attributes are parse trees, but in general they can be any Prolog term.

Here is an example grammar that you can use for testing. You might want to put it into a file named awkgram.prolog. It is the same as Homework 2's awksub_grammar.

  expr(op(T,B,E)) => [term(T), binop(B), expr(E)].
  expr(T) => [term(T)].

  term(N) => [num(N)].
  term(L) => [lvalue(L)].
  term(pre(O,L)) => [incrop(O), lvalue(L)].
  term(post(L,O)) => [lvalue(L), incrop(O)].
  term(E) => ['(', expr(E), ')'].

  lvalue($(E)) => [$, expr(E)].

  incrop(++) => [++].
  incrop(--) => [--].

  binop(+) => [+].
  binop(-) => [-].

  num(0) => [0].
  num(1) => [1].
  num(2) => [2].
  num(3) => [3].
  num(4) => [4].
  num(5) => [5].
  num(6) => [6].
  num(7) => [7].
  num(8) => [8].
  num(9) => [9].

Your implementation of parse_prefix/4 should not use any side effects or cuts, and should avoid using built-in Prolog predicates as much as possible. As an exception, you may use the side-effect-free predicates defined in the Type testing, Term processing and List processing sections of the GNU Prolog manual. Note that setarg has side effects and cannot be used.

Submit

Submit two files. First, submit a file hw4.prolog containing your predicate definitions. Make sure that your definitions work with gprolog, the Prolog implementation installed on SEASnet in /usr/local/cs/bin; if you type the shell command gprolog --version the first line of output should be Prolog top-Level (GNU Prolog) 1.4.0.

Second, submit an ASCII text file hw4.txt that briefly describes how you arrived at your solution, focusing on alternative approaches that you rejected and why. Also, describe what happens when your implementation of parse_prefix/3 is called in an attempt to generate test cases for Homework 2, and why. For example, what happens when you call parse_prefix(expr(op(3,(+),$(4))), RS, S) or even parse_prefix(A,B,C)?

Please do not put your name, student ID, or other personally identifying information in your files.

Examples

In all of these examples, the programmer repeatedly typed ";" in order to backtrack through all possible solutions. (The last example has too many solutions so only the first few are printed.) Your program does not need to return the answers in the same order as shown below. Also, your program's final response need not agree with the yes and ; no responses in the examples shown below.

$ eggert@day:~/junk$ gprolog
GNU Prolog 1.4.0
By Daniel Diaz
Copyright (C) 1999-2011 Daniel Diaz
| ?- consult('hw4.prolog').
compiling hw4.prolog for byte code…

yes
| ?- consult('awkgram.prolog').
compiling awkgram.prolog for byte code…

yes
| ?- parse_prefix(expr(E), [ouch], S).

no
| ?- parse_prefix(expr(E), [9], S).

E = 9
S = [] ? ;

no
| ?- parse_prefix(expr(E), [9,+,$,1,+], S).

E = op(9,+,$(1))
S = [+] ? ;

E = 9
S = [+,$,1,+] ? ;

no
| ?- parse_prefix(expr(E), [9,+,$,1,+], []).

no
| ?- parse_prefix(expr(E), [$,1,+,2], S).

E = op($(1),+,2)
S = [] ? ;

E = $(op(1,+,2))
S = [] ? ;

E = $(1)
S = [+,2] ? ;

no
| ?- parse_prefix(expr(E), [$,3,++,4], [S1|Srest]).

E = $(3)
S1 = ++
Srest = [4] ? ;

E = post($(3),++)
S1 = 4
Srest = [] ? ;

no
| ?- parse_prefix(expr(E), [$,3,++,4], [S1|Srest]), integer(S1).

E = post($(3),++)
S1 = 4
Srest = [] ? ;

no
| ?- parse_prefix(
     expr(E),
     ['(', $, 8, ')', -, $, ++, $, --, $, 9, +,
      '('],
     S).

E = op($(8),-,$(pre(++,$(pre(--,$(9))))))
S = [+,'('] ? ;

E = $(8)
S = [-,$,++,$,--,$,9,+,'('] ? ;

no
| ?-