[111 home > syllabus]
You are a programmer for Big Data Systems, Inc., a company that specializes in large backend systems that analyze big data. Much of BDSI's computation occurs in a cloud or a grid. Computational nodes are cheap SMP hosts with a relatively small number of processors. Nodes typically run simple shell scripts as part of the larger computation, and you've been assigned the job of improving the infrastructure for these scripts.
Many of the shell scripts have command sequences that look like the following (though the actual commands tend to be more proprietary):
(sort < a | cat b - | tr A-Z a-z > c) 2>> d
This command invokes three subcommands. The first runs the command sort with standard input being the file a and standard output being a unnamed pipe 1. The second runs the command cat b - with standard input being pipe 1 and standard output being pipe 2. The third runs the command tr A-Z a-z with standard input being pipe 2 and standard output being the file c. All three commands have standard error sent, via the same file descriptor, to file d in append-only mode.
BDSI's developers have several complaints about these shell scripts:
To address these issues, your boss proposes a new program simpsh, short for "SIMPleton SHell", a very simple, stripped down shell. simpsh does not use a scripting language at all, and you do not interact with it at a terminal or give it a script to run. Instead, developers invoke the simpsh command by passing it arguments telling it which files to access, which pipes to create, and which subcommands to invoke. It then creates or accesses all the files and creates all the pipes processes needed to run the subcommands, and reports the processes's exit statuses as they exit.
For example, the abovementioned command in the standard shell could be run using the following simpsh command. This invocation uses standard shell syntax, because it is invoking simpsh from the standard shell; the command itself, though, is just an array of strings and simpsh interprets this array and executes the same three subcommands that the abovementioned shell command does.
simpsh \ --rdonly a \ --pipe \ --pipe \ --creat --trunc --wronly c \ --creat --append --wronly d \ --command 3 5 6 tr A-Z a-z \ --command 0 2 6 sort \ --command 1 4 6 cat b - \ --wait
This example invocation creates seven file descriptors:
It then creates three subprocesses:
It then waits for all three subprocesses to finish. As each finishes, it prints its exit status, followed by the command and arguments. The output might look like this:
0 sort 0 cat b - 0 tr A-Z a-z
although not necessarily in that order, depending on which order the subprocesses finished.
Here is a detailed list of the command-line options that simpsh should support. Each option should be executed in sequence, left to right.
First are the file flags. These flags affect the next file that is opened. They are ignored if no later file is opened. Each file flag corresponds to an oflag value of open; the corresponding oflag value is listed after the option. Also see Opening and Closing Files and Open-time Flags.
Second are the file-opening options. These flags open files. Each file-opening option also corresponds to an oflag value, listed after the option. Each opened file is given a file number; file numbers start at 0 and increment after each file-opening option. Normally they increment by 1, but the --pipe option causes them to increment by 2.
Third is the subcommand options:
Finally, there are some miscellaneous options:
When there is a syntax error in an option (e.g., a missing operand), or where a file cannot be opened, or where is some other error in a system call, simpsh should report a diagnostic to standard error and should continue to the next option. However, simpsh should ignore any write errors to standard error, so that it does not get into an infinite loop outputting write-error diagnostics.
When simpsh exits other than in response to a signal, it should exit with status equal to the maximum of all the exit statuses of all the subcommands that it ran and successfully waited for. However, if there are no such subcommands, or if the maximum is zero, simpsh should exit with status 0 if all options succeeded, and with status 1 one of them failed. For example, if a file could not be opened, simpsh must exit with nonzero status.
Your implementation will take three phases:
Before charging ahead and implementing, you should be familiar with the man pages for close, dup2, execvp, fork, getopt_long, open, pipe, and sigaction.
Your program should come with a file named Makefile that supports the following actions.
make' builds the
make clean' removes the program and all other temporary files and object files that can be regenerated with '
make check' tests the
simpshprogram on test cases that you design. You should have at least three test cases.
make dist' makes a software distribution compressed tarball
lab1-yourname.tar.gzand does some simple testing on it. This tarball is what you should submit via CCLE. All the files in the tarball should have names of the form
lab1-yourname/...and one of the files should be
Your solution should be written in the C programming language.
Your code should be robust, for example, it should not impose an arbitrary
limit like 216 bytes on the length of a string. You
may use the features
as implemented on the SEASnet GNU/Linux servers running RHEL 7. Please prepend the
/usr/local/cs/bin to your PATH, to get the
versions of the tools that we will use to test your solution. Your
solution should stick to the
standard GNU C library
that is installed on SEASnet, and should not rely on other
You can test your program by running it directly.
Eventually, you should put your own
test cases into a file
run it automatically as part of '
After you implement Lab 1a, submit via CCLE
.tar.gz file that is built by '
dist'. Similarly for 1b and 1c. Your submission should contain
a README file that briefly describes known limitations of your code
and any extra features you'd like to call our attention to.
We will check your work on each lab part by running it on the SEASnet GNU/Linux servers, so make sure they work on there. Lab 1 parts are due at different times, but we will not grade each part separately; the lab grade is determined by your overall work on all three parts.
Here are some suggestions for design problems, if you have been assigned a design problem for Lab 1. You may implement one of them, or design your own. If you design your own, get approval from us before committing significant work to it. Your implementations should include test cases.
For Lab 1a:
For Lab 1b:
For Lab 1c: