Home
Research Publications
Teaching Students CV Software Funding Activities

Program Understanding

Software engineers commonly encounter scenarios where they are asked to work with unfamiliar software systems that are too large in scope for them to fully comprehend within the time given to complete a requested task.  The ability to quickly locate the relevant code and safely perform the required modifications is a crucial skill.  In this laboratory, we will revisit the Saros project, using it as a sample large-scale project within which basic source code comprehension will be necessary for completing the given exercises.

Problem Areas

Note: To ensure that line numbers and referenced code matches, use the Saros code snapshot available from the class svn repository.
For this lab we'll bring together two previous visited topics: Saros and code smells.  Specifically, we will be sniffing out bad code smells within the Saros code that will help us simulate the process of diving into a large and unfamiliar code base and determining how to go about modifying the code to fix a given issue.  Saros, like nearly any significant body of code developed over a non-trivial amount of time, has a few warts.  This part of the lab will contain exercises that involve finding given code smells and proposing the changes necessary to remove the issue.

Duplicate Code

Code that has been copy/pasted into multiple locations in a system causes massive maintenance headaches.  Future changes to one instance of a duplication almost always should be made to the other location as well, but often are not.  This oversight is frequently made because the person making the change does not know that the duplication exists elsewhere, or because they just plain forgot.  Instead, the "once and only once" principle should be applied, namely: behaviors should be declared in the code only once.  This principle is violated multiple places in the Saros code, so we will take a look at an example.

The following code exists in both the "OutgoingSessionNegotiation" and "OutgoingProjectNegotiation" classes (lines 372-473 and 241-342 respectively) within the "de.fu_berlin.inf.dpp.invitation" package:
    }

public void localCancel(String errorMsg, CancelOption cancelOption) {
if (!cancelled.compareAndSet(false, true))
return;
log.debug("Inv" + Utils.prefix(peer) + ": localCancel: " + errorMsg);
if (monitor != null)
monitor.setCanceled(true);
cancellationCause = new LocalCancellationException(errorMsg,
cancelOption);
}

/**
* Cancels the invitation process based on the exception stored in
* {@link #cancellationCause}. This method is always called by this local
* object, so even if an another object "cancels" the invitation (
* {@link #localCancel(String, CancelOption)}, {@link #remoteCancel(String)}
* ), the exceptions will be thrown up on the stack to the caller of
* {@link #start(SubMonitor)}, and not to the object which has "cancelled"
* the process. The cancel methods (
* {@link #localCancel(String, CancelOption)}, {@link #remoteCancel(String)}
* ) do not cancel the invitation alone, but they set the
* {@link #cancellationCause} and cancel the {@link #monitor}. Now it is the
* responsibility of the objects which use the {@link #monitor} to throw a
* {@link SarosCancellationException} (or it's subclasses), which will be
* caught by this object causing a call to this method. If this does not
* happen, the next {@link #checkCancellation(CancelOption)} cancels the
* invitation.
*/
protected void executeCancellation() throws SarosCancellationException {

log.debug("Inv" + Utils.prefix(peer) + ": executeCancellation");
if (!cancelled.get())
throw new IllegalStateException(
"executeCancellation should only be called after localCancel or remoteCancel!");

String errorMsg;
String cancelMessage;
if (cancellationCause instanceof LocalCancellationException) {
LocalCancellationException e = (LocalCancellationException) cancellationCause;
errorMsg = e.getMessage();

switch (e.getCancelOption()) {
case NOTIFY_PEER:
transmitter.sendCancelInvitationMessage(peer, errorMsg);
break;
case DO_NOT_NOTIFY_PEER:
break;
default:
log.warn("Inv" + Utils.prefix(peer)
+ ": This case is not expected here.");
}

if (errorMsg != null) {
cancelMessage = "Invitation was cancelled locally"
+ " because of an error: " + errorMsg;
log.error("Inv" + Utils.prefix(peer) + ": " + cancelMessage);
monitor.setTaskName("Invitation failed. (" + errorMsg + ")");
} else {
cancelMessage = "Invitation was cancelled by local user.";
log.debug("Inv" + Utils.prefix(peer) + ": " + cancelMessage);
monitor.setTaskName("Invitation has been cancelled.");
}

} else if (cancellationCause instanceof RemoteCancellationException) {
RemoteCancellationException e = (RemoteCancellationException) cancellationCause;

errorMsg = e.getMessage();
if (errorMsg != null) {
cancelMessage = "Invitation was cancelled by the remote user "
+ " because of an error on his/her side: " + errorMsg;
log.error("Inv" + Utils.prefix(peer) + ": " + cancelMessage);
monitor.setTaskName("Invitation failed.");
} else {
cancelMessage = "Invitation was cancelled by the remote user.";
log.debug("Inv" + Utils.prefix(peer) + ": " + cancelMessage);
monitor.setTaskName("Invitation has been cancelled.");
}
} else {
log.error("This type of exception is not expected here: ",
cancellationCause);
monitor.setTaskName("Invitation failed.");
}
sarosSession.returnColor(this.colorID);
invitationProcesses.removeInvitationProcess(this);
throw cancellationCause;
}

/**
* Checks whether the invitation process or the monitor has been cancelled.
* If the monitor has been cancelled but the invitation process has not yet,
* it cancels the invitation process.
*
* @throws SarosCancellationException
* if the invitation process or the monitor has already been
* cancelled.
*/
protected void checkCancellation(CancelOption cancelOption)
throws SarosCancellationException {
if (cancelled.get()) {
log.debug("Inv" + Utils.prefix(peer) + ": Cancellation checkpoint");
throw new SarosCancellationException();
That's a lot of duplication!  Instead, the commonality should be moved into an abstraction layer.  Specifically for this case a new abstract base class should be created to contain the cancellation logic.  Namely an "OutgoingCancellableNegotiation" which contains the duplicated code.  The two classes that previously contained the duplicate code should then inherit from this new base class.  By merging the duplicated code into a single definition we can ensures future changes that are required within this block of code will be automatically applied to both places within the system that this functionality is needed.  Whenever possible the burden of work should be placed earlier in the development process instead of delegating it to later maintainers.
Exercise
Locate a similar chunk of duplicated code within the "de.fu_berlin.inf.dpp.invitation" package.  Describe in detail the refactoring that should be performed to remove this duplication.

Long Method

Each method within a class should be concise and easily understandable.  Long methods work against this principle and make it less clear what the method is suppossed to do.  Clarity is king when developing maintainable code, so many software developers operate under the general design guideline (but not a hard and fast rule) that a method should fit entirely on screen without scrolling.  Saros does a pretty good job of adhering closely to this goal, but there are some methods that are a bit long.  For instance, the "start" method in the "ActivitySequencer" class (found in the "de.fu_berlin.inf.dpp.net.internal" package) weighs in at 121 lines of code.  Digging deeper into how the method works we can see that most of this is due to the method containing an anonymous "TimerTask" subclass.  Anonymous subclasses are good for classes that can be extended simply, but in this case many lines of code are needed to express the desired operation of the subclass.  A separate internal class would be better here.  Indeed, by creating a private internal class within "ActivitySequencer" it becomes clearer what "start" does and also allows for a more natural separation of what the "TimerTask" will do (something currently lacking even the most basic comment documentation).
Exercise
Locate a method that is too large within the "Saros" class in the "de.fu_berlin.inf.dpp" package.  Describe in detail the refactoring that should be done to improve the clarity of the code.

Large Class

As new functionality is added to a system, it is natural for the underlying classes to grow as well.  However, at some point it makes sense to take a step back and determine if the class is doing too much and should be partitioned so that it focuses on solving a primary concern.  This principle of "separation of concerns" is a crucial concept in improving software maintainability.  Looking at the Saros code, we can find several classes that have grown quite large.  Take a look at the "Utils" class in the "de.fu_berlin.inf.dpp.util" package.  This class has grown to over 1100 lines, far too large to quickly understand and work with easily.  It clearly has become a dumping ground for general purpose static methods, we can help improve the clarity here by refactoring the structure and adding additional subcategories of utilities (indeed, the "de.fu_berlin.inf.dpp.util" package already contains some such subcategories of utilities).  Looking at the current contents of the "Utils" class there are several groups of methods that could be moved into their own class.  One example would be the many "runner" methods (e.g.: "runSafeSync", "runSafeAsync", etc.).  Moving these into their own "RunnerUtils" class would help reduce the massive scope of the "Utils" class nicely.
Exercise
Locate a class that is too large within the Saros source code.  Describe in detail the refactoring that should be done to improve the structure and clarity of the code.

Hint: The following command line incantation can help find large classes by listing the 20 largest Java source code files within the current working directory and subdirectories (see the previous lab on command line utilities for a reminder of how this command works).
find . -name '*.java' | xargs wc | sort -n | tail -n20

Cross-cutting Concerns

Up until now in this lab, we have focused our code comprehension and suggested changes on a mostly local scope affecting only a single or handful of classes.  However, often desired new functionality in a system will require more extensive changes to the code.  This section will introduce such a scenario for the Saros project and ask you to craft a design that meets the desired requirements while simultaneously improving the maintainability of the codebase.  Your solution will require you to approach the problem in three steps:
  1. Locate and understand the relevant code as it is currently implemented
  2. Propose a refactoring plan that will enable the change
  3. Propose the required additions that will need to be made given the proposed refactoring
Exercise
As currently implemented, the Saros project is dependent upon using an XMPP (also known as Jabber) server for all the backend communications.  In our hypothetical situation, a client has expressed interest in using a different backend communications infrastructure: IRC (Internet Relay Chat).  They find this more desirable than XMPP because many of them already have IRC infrastructure readily accessible to them and do not want to go to the trouble of deploying their own XMPP service.  Your task:
  1. At a high level, describe how the current backend communication system works in the Saros codebase
  2. Describe what refactoring needs to be performed to provide the level of abstraction necessary to support additional communication backends
  3. Describe the necessary additions to hook into the refactored code proposed above
In your solution be sure to:

Reference materials:

Developed with guidance from Miryung Kim

Copyright (c) 2011 Professor Miryung Kim and a course development TA Evan Grim

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license can be found on the GNU web site here.