Homework 4. Morse Code Data Recovery

Motivation

You are working for the Library of Congress and are helping to analyze some ancient audio recordings of International Morse Code operators in action. Someone else has written a program that takes the analog signals from these recordings and turns them into a digital representation of signal-on/signal-off.

Morse code represents each letter using a string of . (dih) and - (dah) symbols. At the signal level, 1 (a signal-on of duration 1) represents dih, 111 (a signal-on of duration 3) represents dah, 0 (a signal-off of duration 1) separates dihs from dahs within a letter, 000 (signal-off of duration 3) represents the boundary between two letters, and 0000000 (signal-off of duration 7) represent the space between words.

For example, if the message is MORSE CO, its Morse code is -- --- .-. ... . followed by a word space followed by -.-. --- and its signal representation is 111011100011101110111000101110100010101000100000001110101110100011101110111; this last is the output of your colleague's program and the input to your program.

Unfortunately, the audio recordings are of poor quality, and their low-level analysis sometimes produces instances of 11 (two adjacent 1s only); these are errors, and the original signals could have been either 1 or 111. Similarly, the input to your program sometimes contains instances of 00, which could have been either 0 or 000 in the original data; of 0000, which must have been 000; and of 00000, which could be either 000 or 0000000. When the input is ambiguous, you need to evaluate either possibility, though you'll prefer to guess the shorter alternative first. You should treat any sequence of more than five 0s as if it were 0000000, and any sequence of more than three 1s as if it were 111.

For example, if the input is 11100111001111 you want to first guess that the actual data were 11101110111; if this doesn't work you'll then fall back on 1110111000111, 1110001110111, and 111000111000111, respectively.

Assignment

First, write a Prolog predicate signal_morse/2 that converts a list of 1s and 0s to the corresponding list of .s and -s. For example, the goal:

?- signal_morse([1,1,1,0,1,1,1,
  0,0,0,1,1,1,0,1,1,1,0,1,1,1,0,0,0,1,0,1,1,1,
  0,1,0,0,0,1,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,
  1,1,1,0,1,0,1,1,1,0,1,0,0,0,1,1,1,0,1,1,1,
  0,1,1,1], M).

should succeed with:

M = [-,-,^,-,-,-,^,.,-,.,^,.,.,.,^,.,#,-,.,-,.,^,-,-,-]

Here - and . stand for themselves; ^ stands for a boundary between letters, and # stands for a space between words. (The GNU Prolog interpreter prints the atom . as '.', but this is equivalent to unquoted . here.)

If signal_morse/2 gets an ambiguous signal list, it should generate all possible solutions, preferring solutions with shorter leftmost alternatives, as described above.

Second, write a Prolog predicate signal_message/2 that converts a list of 1s and 0s to the corresponding list of letters, interpreted according to the following Morse code table:

morse(a, [.,-]).           % A
morse(b, [-,.,.,.]).	   % B
morse(c, [-,.,-,.]).	   % C
morse(d, [-,.,.]).	   % D
morse(e, [.]).		   % E
morse(f, [.,.,-,.]).	   % F
morse(g, [-,-,.]).	   % G
morse(h, [.,.,.,.]).	   % H
morse(i, [.,.]).	   % I
morse(j, [.,-,-,-]).	   % J
morse(k, [-,.,-]).	   % K
morse(l, [.,-,.,.]).	   % L
morse(m, [-,-]).	   % M
morse(n, [-,.]).	   % N
morse(o, [-,-,-]).	   % O
morse(p, [.,-,-,.]).	   % P
morse(q, [-,-,.,-]).	   % Q
morse(r, [.,-,.]).	   % R
morse(s, [.,.,.]).	   % S
morse(t, [-]).	 	   % T
morse(u, [.,.,-]).	   % U
morse(v, [.,.,.,-]).	   % V
morse(w, [.,-,-]).	   % W
morse(x, [-,.,.,-]).	   % X
morse(y, [-,.,-,-]).	   % Y
morse(z, [-,-,.,.]).	   % Z
morse('a"', [.,-,.,-]).    % Ä (A with diaeresis)
morse(ao, [.,-,-,.,-]).    % Å (A with ring)
morse(ch, [-,-,-,-]).      % Ch (C then h)
morse('e''', [.,.,-,.,.]). % É (E with acute accent)
morse('n~', [-,-,.,-,-]).  % Ñ (N with tilde)
morse('o"', [-,-,-,.]).    % Ö (O with diaeresis)
morse('u"', [.,.,-,-]).    % Ü (U with diaeresis)
morse(0, [-,-,-,-,-]).	   % 0
morse(1, [.,-,-,-,-]).	   % 1
morse(2, [.,.,-,-,-]).	   % 2
morse(3, [.,.,.,-,-]).	   % 3
morse(4, [.,.,.,.,-]).	   % 4
morse(5, [.,.,.,.,.]).	   % 5
morse(6, [-,.,.,.,.]).	   % 6
morse(7, [-,-,.,.,.]).	   % 7
morse(8, [-,-,-,.,.]).	   % 8
morse(9, [-,-,-,-,.]).	   % 9
morse(., [.,-,.,-,.,-]).   % . (period)
morse(',', [-,-,.,.,-,-]). % , (comma)
morse(?, [.,.,-,-,.,.]).   % ? (question mark)
morse('''',[.,-,-,-,-,.]). % ' (apostrophe)
morse(!, [-,.,-,.,-,-]).   % ! (exclamation point)
morse(/, [-,.,.,-,.]).     % / (slash)
morse(:, [-,-,-,.,.,.]).   % : (colon)
morse(-, [-,.,.,.,.,-]).   % - (minus)
morse(bracket, [-,.,-,-,.,-]). % Either '(' or ')'.
morse('"', [.,-,.,.,-,.]). % " (quote mark)
morse(@, [.,-,-,.,-,.]).   % @ (at sign)
morse(=, [-,.,.,.,-]).     % = (equal sign)

% Error.
morse(error, [.,.,.,.,.,.,.,.]). % error - see below

% Prosigns.
morse(ar, [.,-,.,-,.]).          % AR (All Right)
morse(as, [.,-,.,.,.]).          % AS (wait A Second)
morse(cl, [-,.,-,.,.,-,.,.]).    % CL (CLosing down)
morse(cq, [-,.,-,.,-,-,.,-]).    % CQ (seek you)
morse(kn, [-,.,-,-,.]).          % KN (oK, Name)
morse(sk, [.,.,.,-,.,-]).        % SK (Silent Key)
morse(sos, [.,.,.,-,-,-,.,.,.]). % SOS (emergency! request aid)

For example:

?- signal_message([1,1,1,0,1,1,1,
  0,0,0,1,1,1,0,1,1,1,0,1,1,1,0,0,0,1,0,1,1,1,
  0,1,0,0,0,1,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,
  1,1,1,0,1,0,1,1,1,0,1,0,0,0,1,1,1,0,1,1,1,
  0,1,1,1], M).

should succeed with:

M = [m,o,r,s,e,#,c,o]

Again, # stands for the space between words, but there is no ^ at this level because the boundary between letters is understood.

If the signal contains noise, signal_message/2 should backtrack among all possible messages that it can stand for, interpreted according to the above-mentioned disambiguation rules for noisy signals. However, signal_message/2 should succeed only for disambiguated signals that correspond to a valid Morse code message.

As a special case, if your signal_message/2 implementation finds a word, followed by zero or more spaces, followed by an error token, it should omit the word, the spaces, and the error token. For example:

?- signal_message([
    1,1,1,0,1,1,1, % m
    0,0,0,
    1,1,1,0,1,1,1,0,1,1,1, % o
    0,0,0,
    1,0,1,1,1,0,1, % r
    0,0,0,
    1,0,1,0,1, % s
    0,0,0,
    1, % e
    0,0,0,0,0,0,0, % #
    1,1,1,0,1,0,1,1,1,0,1, % c
    0,0,0,
    1,1,1,0,1,1,1,0,1,1,1, % o
    0,0,0,
    1,0,1,0,1,0,1,0,1,0,1,0,1,0,1, % error
    0,0,0,
    1,0,1,0,1,0,1,0,1,0,1,0,1,0,1, % error
    0,0,0,
    1,0,1,1,1,0,1,0,1,0,1,0 % as
  ], M).

should succeed with M=[m,o,r,s,e,#,error,as], because the partial word 'CO' was erased by the first error token. The second error token was not preceded by a word and optional spaces, so it is preserved in the output.

Your implementations of signal_morse/2 and signal_message/2 may assume that their first arguments are ground terms, that is, terms that contain no logical variables.

To turn in your assignment, submit a file hw4.prolog containing your definition of signal_morse/2 and signal_message/2 and any other auxiliary definitions (including the definition of morse/2 above, if you're using it unmodified). If any extra text information is needed, other than what's in the comments, please submit it as a separate text file hw4.txt. Please do not put your name, student ID, or other personally-identifying information in your files.