Homework 3. Multithreaded grep

Background

The Unix grep program finds lines that match a regular expression. A Java version of grep is available, but it's slower than GNU grep.

Assignment

Make this implementation run faster by running it in a multithreaded way. Split each input file into two equal-sized regions, and look for instances of the regular expression in each region, in a separate thread. The second thread should not output any lines until the first thread is done; instead, it should simply record the matching lines internally, so that it can operate in parallel with the first thread, and wait until the first thread is done before outputting anything based on its records. If an input line straddles the boundary between the two regions, or begins at the very start of the second region, the first thread (and not the second thread) should output the line.

Call your program Grep2.

Measure the performance of the original Java grep, compared to your modified version, and compare both to GNU grep. For your measurement platform, use 64-bit Solaris 8 sparc, on either landfair.seas.ucla.edu or westholme.seas.ucla.edu (these are equivalent two-processor machines; they are both old and slow, to make measuring easier!), with /usr/local/cs/bin at the start of your PATH, as before. Try the following benchmark with GNU grep:

LC_ALL='C' time grep -n 'a.*e.*i.*o.*u.*w.*y' /dev/null /usr/dict/words

and compare it to the original Java version:

LC_ALL='C' time java Grep 'a.*e.*i.*o.*u.*w.*y' /dev/null /usr/dict/words

as well as to your modified Grep2 program. Try at least one other benchmark as well, of your own design.

Assess your work by writing an after-action report that summarizes the performance that you observed. Focus in particular on any problems you foresee as the file size and the number of threads scales up, and which method you expect to work better in general (even if you couldn't measure the difference on SEASnet). This report should be a simple ASCII text file that consumes at most three pages.

Your implementation should operate correctly under Java Standard Edition 6 Update 18. There is no need to run on older Java versions. Your program should compile cleanly, without any warnings.

If your PATH is set correctly, the command "java -version" should output the following text:

java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)

and the command "grep --version" should output "GNU grep 2.5.4" followed by some licensing information.

Submit

To turn in your assignment, submit a single jar file hw3.jar that contains both your Java source code and a plain text file README.txt that holds your assessment. Do not submit class files. Also, please do not put your name, student ID, or other personally identifying information in your files. Before submitting hw3.jar you should test it using the following shell commands on SEASnet:

# Make a fresh directory and change into it.
mkdir testdir
cd testdir

# Extract your program.
jar xf ../hw3.jar

# Make sure you have a README.txt file.
ls -l README.txt

# Build your modified version.
javac `find . -name '*.java'`

# Run your modified version
java Grep2 'a.*e.*i.*o.*u' /dev/null /usr/dict/words

Hint

Core Java Technologies Tech Tips (2005-02-16).


© 2010 Paul Eggert. See copying rules.
$Id: hw3.html,v 1.54 2010/01/26 08:29:19 eggert Exp $