Homework 3. Parallelized word counter

Background

You are working for Retrospective Software, Inc. (RSI), a reverse-engineering software firm. One of the jobs your applications often need to do is to count the number of lines in source code and documentation, in order to do cost estimation and the like. Currently you use the Unix wc program to do this, using its -l option. For example, assuming the shell prompt is $, you might do something like this:

$ wc -l *.java
  16 Factorial.java
   5 ParallelSimulator.java
  13 RunJohnny.java
  58 Summary.java
   6 Test.java
  98 total

But wc is sequential. Nowadays you have machines with many cores and you want your analyses to run faster. So you decide to write a parallelized version of wc.

Assignment

Write a Java program Wc.java that behaves like wc -l, except it uses a separate thread for each input file. The output should resemble that of wc, except the order of the output lines may differ, depending on which thread is scheduled first. However, the "total" line must always come last.

You need not worry about the exact number of spaces at the start of each output line. Also, you need not worry about the case where wc reads standard input because it is given no operands or is given an operand of -. You may therefore assume that Wc will be given at least one explicit input file to read.

Measure the performance of your program, as compared to the standard Unix wc program, using the Unix time command. For example, on lindbrook.seas.ucla.edu, assuming bash (which prompts with $), here is how to time wc:

$ time wc -l /opt/SUNWvts/lib/data/*
    1025 /opt/SUNWvts/lib/data/2dprimitives.cksum
    6258 /opt/SUNWvts/lib/data/2dprimitives.zs
       0 /opt/SUNWvts/lib/data/3dTpRGBA8Tex1.tex
…
  207738 /opt/SUNWvts/lib/data/zulu_tp.zs
 1164464 total

real    0m1.482s
user    0m0.680s
sys     0m0.250s

Assess your work by writing an after-action report that summarizes why you solved the problem the way you did, alternative approaches that you considered and rejected (and why you rejected them), and any weaknesses in the code that you submitted. Focus in particular on any problems you foresee as the problem scales in size. This report should be a simple ASCII text file that consumes at most three pages.

Your implementation should operate correctly under Java Standard Edition 6 Update 1. There is no need to run on older Java versions. Your program should compile cleanly, without any warnings.

To run Java SE 6u1 on SEASnet, you can prepend /usr/local/cs/bin to your PATH, using the same technique you used for the previous homeworks. The command java -version should output the following text:

java version "1.6.0_01"
Java(TM) SE Runtime Environment (build 1.6.0_01-b06)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_01-b06, mixed mode)

Since this is a 64-bit version of Java, you should be fair by also testing with a 64-bit version of wc. The command wc --version should output the following text:

wc (GNU coreutils) 6.9
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software.  You may redistribute copies of it under the terms of
the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.
There is NO WARRANTY, to the extent permitted by law.

Written by Paul Rubin and David MacKenzie.

Submit

To turn in your assignment, submit a single jar file hw3.jar that contains both your Java source code and a plain text file README.txt that holds your assessment. Do not submit class files. Also, please do not put your name, student ID, or other personally identifying information in your files. Before submitting hw3.jar you should test it using the following shell commands on SEASnet:

# Make a fresh directory and change into it.
mkdir testdir
cd testdir

# Extract your program.
jar xf ../hw3.jar

# Make sure you have a README.txt file.
ls -l README.txt

# Build your modified version.
javac `find . -name '*.java'`

# Run your modified version
java Wc `find . -name '*.java'`

Hint

Use CyclicBarrier. For an introduction to CyclicBarrier, see Core Java Technologies Tech Tips (2005-02-16).


© 2007 Paul Eggert. See copying rules.
$Id: hw3.html,v 1.38 2007/04/23 08:06:56 eggert Exp $