Homework 1. Ada±-to-C translator

The problem

Your employer Amalgamated Data Associates has a large body of inhouse code developed in a locally-maintained variant to Ada, called "Ada±". Your team has been assigned to port one of the Ada± applications to AIX, HP-UX, GNU/Linux, and Solaris; these operating systems all use C as their primary systems programming language.

As part of the port, your team needs to write C code that accesses and modifies the internal data structures of the Ada± programs. You could write C include files that are translations of the corresponding Ada± headers, but this would be a painstaking and error-prone process, and your team would have to redo the translations whenever the applications' Ada± headers change. So, to simplify the job, your team decides to prototype a translator from Ada± headers to C code, so that your application's headers can all be translated automatically.

Your boss assigns you the task of translating Ada± types to C types. These types are defined by the following grammar, which uses a notation similar to that of the Ada 95 Syntax Summary. In this grammar, IDENTIFIER stands for a contiguous sequence of ASCII letters, digits, and underscores, the first of which must be a lowercase letter. (The actual Ada± grammar excludes reserved words from the set of identifiers, but you need not worry about that detail in this assignment.) NUMERAL stands for an unsigned decimal integer constant. Other upper-case words like ARRAY and MOD stand for their lowercase counterparts as reserved words. A phrase (i.e., sequence of symbols) surrounded by { } represents zero or more occurrences of the contained phrase; for example {, enumeration_literal_specification} represents zero or more instances of enumeration_literal_specification each preceded by a comma.

type ::= array_type | nonarray_type

nonarray_type ::=
     type_name
   | enumeration_type | integer_type
   | real_type
   | record_type      | access_type

type_name ::= identifier

identifier ::= IDENTIFIER

The IDENTIFIER must not start with a uppercase letter or an underscore, and must not be a C keyword. The C keywords are auto, break, case, char, const, continue, default, do, double, else, enum, extern, float, for, goto, if, inline, int, long, register, restrict, return, short, signed, sizeof, static, struct, switch, typedef, union, unsigned, void, volatile, and while.

enumeration_type ::=
   ( enumeration_literal_specification {, enumeration_literal_specification} )

enumeration_literal_specification ::= defining_identifier

defining_identifier ::= identifier

integer_type ::=
      IDENTIFIER
    | MOD 2 ** NUMERAL

Here, IDENTIFIER must be one of the following:

IDENTIFIER corresponding C type
integer int
long_integer long int
long_long_integer long long int
short_integer short int
short_short_integer signed char

and NUMERAL must be one of the following:

NUMERAL corresponding C type
8 unsigned char
16 unsigned short int
32 unsigned int
64 unsigned long long int
real_type ::= IDENTIFIER

Here, IDENTIFIER must be one of the following:

IDENTIFIER corresponding C type
float float
long_float double
long_long_float long double
array_type ::=
   constrained_array

constrained_array ::=
   ARRAY ( discrete_subtype ) OF component

discrete_subtype ::= range

range ::= 0 .. NUMERAL

record_type ::=
   RECORD component_list END RECORD

component_list ::=
   component_item {component_item}

component_item ::=
   defining_identifier : component ;

component ::= nonarray_type

access_type ::=
    access_to_object

access_to_object ::=
    ACCESS general_access_modifier nonarray_type

general_access_modifier ::= ALL | CONSTANT

A few other comments about Ada± may be in order here. Unlike Ada, Ada± allows nonarray type definitions to be nested inside other type definitions. Conversely, Ada± arrays (unlike Ada arrays) must be zero-origin and their upper bounds must be an integer constant. Ada±, like Ada, distinguishes between access all T, which (like C's T *) is a general pointer to objects of type T, and access constant T, which (like C's T const *) can be used only to read objects and cannot be used to write them.

Assignment

Write a Prolog predicate ada2c_type(A,C) that succeeds if A is an Ada± type corresponding to the C-language type C, using the rules described above. The input and output types are represented using lists of Prolog terms; each term is either a Prolog integer (representing an integer), a Prolog atom (representing a keyword or other symbol), or a term of the form id(X) where X is a Prolog atom (representing an identifier).

You may assume that the first argument to ada2c_type/2 is a ground term, that is, it does not contain any logical variables. Your program may not use any side effects like assert, retract, or I/O; however, it may use cuts.

When multiple answers are possible, your program should backtrack through them all if requested. This can happen when translating Ada identifiers like integer, which may translate either to C keywords (e.g., int) or C identifiers (e.g., integer) depending on whether the Ada identifier is nested inside a scope where the standard identifier has been redeclared. In such cases, your program should first return the usual (C keyword) answer, and should return the alternative (C identifier) answer only after backtracking.

To turn in your assignment, submit a file ada2c.pl containing your predicate definitions. The first line of ada2c.pl should be a comment containing your name and student ID. Make sure that your definitions work with gprolog, the Prolog implementation installed on SEASnet.

Hint

Use definite clause grammars.

Examples

In these examples, the programmer repeatedly typed ";" in order to backtrack through all possible solutions. Your program's final response need not agree with the yes and ; no responses in the examples shown below; for example, it can output yes where the example below waits for the user to type ; before it outputs no.

| ?- consult('ada2c.pl').
compiling ada2c.pl for byte code...
…
yes
| ?- ada2c_type([id(integer)], C).

C = [int] ? ;

C = [id(integer)]

yes
| ?- ada2c_type([id(long_integer)], C).

C = [long,int] ? ;

C = [id(long_integer)]

yes
| ?- ada2c_type([id(float)], C).

C = [float] ? ;

no
| ?- ada2c_type([id(a_t)], C).

C = [id(a_t)]

yes
| ?- ada2c_type(['(', id(a), ')'], C).

C = [enum,'{',id(a),'}'] ? ;

no
| ?- ada2c_type(['(', id(a), ',', id(b), ')'], C).

C = [enum,'{',id(a),',',id(b),'}'] ? ;

no
| ?- ada2c_type([mod, 2, **, 64], C).

C = [unsigned,long,long,int] ? ;

no
| ?- ada2c_type([array, '(', 0, .., 10, ')', of, id(integer)], C).

C = [int,'[',11,']'] ? ;

C = [id(integer),'[',11,']'] ? ;

no
| ?- ada2c_type([array, '(', 0, .., 0, ')', of, id(a_t)], C).

C = [id(a_t),'[',1,']'] ? ;

no
| ?- ada2c_type([record,
| ?-               id(a), :, id(integer), ';',
| ?-               id(b), :, id(float), ';',
| ?-             end, record],
| ?-            C).

C = [struct,'{',int,id(a),;,float,id(b),;,'}'] ? ;

C = [struct,'{',id(integer),id(a),;,float,id(b),;,'}'] ? ;

no
| ?- ada2c_type([access, all, id(integer)], C).

C = [int,*] ? ;

C = [id(integer),*] ? ;

no
| ?- ada2c_type([access, constant, id(float)], C).

C = [float,const,*] ? ;

no
| ?- ada2c_type([access, constant, access, all, id(float)], C).

C = [float,*,const,*] ? ;

no
| ?- ada2c_type([array, '(', 0, .., 100, ')', of, record,
| ?-               id(a), :, id(integer), ';',
| ?-               id(b), :, id(float), ';',
| ?-             end, record],
| ?-            C).

C = [struct,'{',int,id(a),;,float,id(b),;,'}','[',101,']'] ? ;

C = [struct,'{',id(integer),id(a),;,float,id(b),;,'}','[',101,']'] ? ;

no
| ?- ada2c_type([access, constant,
| ?-               access, all,
| ?-                 record,
| ?-                   id(a), :, id(integer), ';',
| ?-                   id(b), :, id(float), ';',
| ?-                 end, record],
| ?-            C).

C = [struct,'{',int,id(a),;,float,id(b),;,'}',*,const,*] ? ;

C = [struct,'{',id(integer),id(a),;,float,id(b),;,'}',*,const,*] ? ;

no

The following goals should all fail immediately; that is, they should report no at the top level.

ada2c_type([], C).
ada2c_type([id(const)], C).
ada2c_type(['('], C).
ada2c_type(['(', id(a)], C).
ada2c_type(['(', id(if), ')'], C).
ada2c_type(['(', id(a), id(b), ')'], C).
ada2c_type([mod, 2, **, 63], C).
ada2c_type([mod], C).
ada2c_type([array, '(', 0, .., -1, ')', of, id(a_t)], C).
ada2c_type([array, '(', 1, .., 10, ')', of, id(a_t)], C).
ada2c_type([record,
              id(a), :, array, '(', 0, .., 100, ')', of, id(integer), ';',
              id(b), :, id(float), ';',
            end, record],
           C).
ada2c_type([access, id(integer)], C).
ada2c_type([access, constant, array, '(', 0, .., 10, ')', of, id(integer)], C).

Question and answer

Q.
What is meant by "narrative rule" and "context-dependent" in Ada 95 §1.1.4?
A.
Ada 95 is specified by two things:
  1. a context-free grammar, which is a formal description of what syntax is allowed; and
  2. some extra less-formal stuff that is written in English. This stuff is written in the form of narrative rules (i.e., rules written in English), and it specifies everything that the grammar can't specify. Since the grammar can specify only the rules of the language that are context-free, the narrative rules specify all the other (i.e., context-dependent) parts of the language.
Homework 1 talks only about (1); you don't have to worry about (2) to do the homework. For more on context-free grammars, please see Sebesta §3.3.1.

A solution

ada2c_type(Ada, C) :- type(C, [], Ada, []).

type(C, C1) --> array_type(C, C1); nonarray_type(C, C1).

nonarray_type(C, C1) -->
	enumeration_type(C, C1);
	integer_type(C, C1);
	real_type(C, C1);
	record_type(C, C1);
	access_type(C, C1);
	type_name(C, C1).

type_name(C, C1) --> id(C, C1).

id([id(X) | C], C) --> [id(X)], {\+ c_keyword(X)}.

enumeration_type([enum, '{' | C], C2) -->
	['('],
	defining_identifier(C, C1),
	enumeration_literal_specifications(C1, ['}' | C2]),
	[')'].

enumeration_literal_specifications(C, C) --> [].
enumeration_literal_specifications([',' | C1], C2) --> [','], id(C1, C2).

defining_identifier(C, C1) --> id(C, C1).

integer_type([int | C], C) --> [id(integer)].
integer_type([long, int | C], C) --> [id(long_integer)].
integer_type([long, long, int | C], C) --> [id(long_long_integer)].
integer_type([short, int | C], C) --> [id(short_integer)].
integer_type([signed, char | C], C) --> [id(short_short_integer)].
integer_type([unsigned, char | C], C) --> [mod, 2,**,8].
integer_type([unsigned, short, int | C], C) --> [mod, 2,**,16].
integer_type([unsigned, int | C], C) --> [mod, 2,**,32].
integer_type([unsigned, long, long, int | C], C) --> [mod, 2,**,64].

real_type([float | C], C) --> [id(float)].
real_type([double | C], C) --> [id(long_float)].
real_type([double | C], C) --> [id(long_long_float)].

array_type(C, C1) -->
	constrained_array(C, C1).

constrained_array(C, C2) -->
	[array, '('], discrete_subtype(C1, [']'|C2]), [')', of],
	component(C, ['['|C1]).

discrete_subtype(C, C1) --> range(C, C1).

range([T1 | C1], C1) --> [0, .., T], {integer(T), 0 =< T, T1 is T + 1}.

record_type([struct, '{' | C], C1) -->
	[record], component_list(C, C1), [end, record].

component_list(C, C2) -->
	component_item(C, ['}' | C2]).
component_list(C, C2) -->
	component_item(C, C1),
	component_list(C1, C2).

component_item(C1, C3) -->
	defining_identifier(C2, [;|C3]), [:], component(C1, C2), [;].

component(C, C1) --> nonarray_type(C, C1).

access_type(C, C1) -->
    access_to_object(C, C1).

access_to_object(C, C2) -->
    [access], general_access_modifier(C1, [*|C2]), nonarray_type(C, C1).

general_access_modifier(C, C) --> [all].
general_access_modifier([const|C], C) --> [constant].

c_keyword(auto).
c_keyword(break).
c_keyword(case).
c_keyword(char).
c_keyword(const).
c_keyword(continue).
c_keyword(default).
c_keyword(do).
c_keyword(double).
c_keyword(else).
c_keyword(enum).
c_keyword(extern).
c_keyword(float).
c_keyword(for).
c_keyword(goto).
c_keyword(if).
c_keyword(inline).
c_keyword(int).
c_keyword(long).
c_keyword(register).
c_keyword(restrict).
c_keyword(return).
c_keyword(short).
c_keyword(signed).
c_keyword(sizeof).
c_keyword(static).
c_keyword(struct).
c_keyword(switch).
c_keyword(typedef).
c_keyword(union).
c_keyword(unsigned).
c_keyword(void).
c_keyword(volatile).
c_keyword(while).

© 2004 Paul Eggert. See copying rules.
$Id: hw1.html,v 1.19 2004/05/03 22:57:24 eggert Exp $