Assignment 6. System call programming and debugging

Useful pointers

Laboratory: Buffered versus unbuffered I/O

As usual, keep a log in the file lab.txt of what you do in the lab so that you can reproduce the results later. This should not merely be a transcript of what you typed: it should be more like a true lab notebook, in which you briefly note down what you did and what happened.

For this laboratory, you will implement transliteration programs tr2b and tr2u that use buffered and unbuffered I/O respectively, and compare the resulting implementations and performance. Each implementation should be a main program that takes two operands from and to that are byte strings of the same length, and that copies standard input to standard output, transliterating every byte in from to the corresponding byte in to. Your implementations should report an error from and to are not the same length, or if from has duplicate bytes. To summarize, your implementations should like the standard utility tr does, except that they have no options, characters like [, - and \ have no special meaning in the operands, operand errors must be diagnosed, and your implementations act on bytes rather than on (possibly multibyte) characters.

  1. Write a C transliteration program tr2b.c that uses getchar and putchar to transliterate bytes as described above.
  2. Write a C program tr2u.c that uses read and write to transliterate bytes, instead of using getchar and putchar. The nbyte arguments to read and write should be 1, so that the program reads and writes single bytes at a time.
  3. Use the strace command to compare the system calls issued by your tr2b and tr2u commands (a) when copying one file to another, and (b) when copying a file to your terminal. Use a file that contains at least 5,000,000 bytes.
  4. Use the time command to measure how much faster one program is, compared to the other, when copying the same amount of data.

Homework: Encrypted sort revisited

Rewrite the sfrob program you wrote previously so that it uses system calls rather than <stdio.h> to read standard input and write standard output. Your program should use a small number of read system calls, ordinarily by determining the input file's size when possible and allocating a buffer that is slightly larger than that size. If the input's size cannot easily be determined (for example, because the input is a pipe), your program can start with a small buffer size of 8 KiB. Either way, your program should repeatedly append data to the buffer by reading the data (passing the largest count to read that cannot overrun the buffer) until read reports an error or end-of-file; if the buffer fills up, your program should reallocate the buffer to be twice as large as it was before and then resume reading. Assuming enough memory exists this approach should work even if the input file grows while your program is reading it, which is something that you should be able to test.

When debugging, you may find the AddressSanitizer (asan) and the Undefined Behavior Sanitizer (ubsan) useful; these can be enabled with the GCC options -fsanitize=address and -fsanitize=undefined, respectively.

Your program should do one thing in addition to sfrob. If given the -f option, your program should ignore case while sorting, by using the standard toupper function to upper-case each byte after decrypting and before comparing the byte. You can assume that each input byte represents a separate character; that is, you need not worry about multi-byte encodings. As noted in its specification, toupper's argument should be either EOF or a nonnegative value that is at most UCHAR_MAX (as defined in <limits.h>); hence one cannot simply pass a char value to toupper, as char is in the range CHAR_MIN..CHAR_MAX.

Call the rewritten program sfrobu. Measure any differences in performance between sfrob and sfrobu using the time command. Run your program on inputs of varying numbers of input lines (say zero, 100, 10,000, and a million lines), and estimate the CPU time cost as a function of the number of input lines.

Also, suppose the assignment were changed so that the rewritten program also had to use system calls rather than <stdlib.h> memory allocation functions such as malloc, reallocfree. Which system calls would the program use, and what would the calls' arguments look like? Use strace on your sfrobu runs on inputs of varying size, to deduce what system calls sfrobu uses to allocate memory (or to reallocate memory if the input file grows while sfrobu is running).

Submit

Submit the following files separately (not in a tarball):

All files should be ASCII text files, with no carriage returns, and with no more than 200 columns per line. The C source file should contain no more than 132 columns per line. The shell commands

expand report.txt |
  awk '/\r/ || 200 < length'
expand tr2b.c tr2u.c sfrobu.c |
  awk '/\r/ || 132 < length'

should output nothing.