Project. Twisted application server herd prototype

Background

Wikipedia and its related sites are based on the Wikimedia Architecture, which uses a LAMP platform based on GNU/Linux, Apache, MySQL, and PHP, using multiple, redundant web servers behind a load-balancing virtual router for reliability and performance. For a brief introduction to the Wikimedia Architecture, please see Mark Bergsma, Wikimedia architecture (2007). For a more extensive discussion, please see Domas Mituzas, Wikipedia workbook, MySQL Users Conference 2007.

While LAMP works fairly well for Wikipedia, let's assume that we are building a new Wikimedia-style service where (1) updates to articles will happen far more often and (2) access will be required via various protocols, not just HTTP. In this new service the application server looks like it will be a bottleneck. From a software point of view our application will turn into too much of a pain to add nontraditional servers (e.g., for access via cell phones, where the cell phones are frequently broadcasting their GPS locations). From a systems point of view the response time looks like it will too slow because the Wikimedia application server is a central bottleneck.

Your team has been asked to look into a different architecture called an "application server herd", where the multiple application servers communicate directly to each other as well as via the core database and caches. The interserver communications are designed for rapidly-evolving data (e.g., GPS-based locations) whereas the database server will still be used for more stable data that is less-often accessed or that requires transactional semantics. For example, you might have four application servers A, B, C, D such that A talks with B and C, and B and C talk with D, so that if a user's cell phone posts updates about its location to any one of the application servers then the other servers will learn of the change after one or two interserver transmissions, without having to talk to the database.

You've been delegated to look into the Twisted event-driven networking framework as a candidate for replacing part or all of the Wikimedia platform for your application. Your boss thinks that this might be a good match for the problem, since Twisted's event-driven nature should allow an update to be processed and forwarded rapidly to other servers in the herd. However, he doesn't know how well Twisted will really work in practice. In particular, he wants to know how easy is it to write applications using Twisted, and how maintainable and reliable those applications will be. He wants you to dig beyond the hype and really understand the pros and cons of using Twisted.

Assignment

Do some research on Twisted as a potential framework for this kind of application. Your research should include an examination of the Twisted source code and documentation, and a small prototype or example code of your own that demonstrates whether Twisted would be an effective way to implement an application server herd. Please base your research on Twisted 2.5.0 (dated 2007-01-11), even if a newer version comes out before the due date; that way we'll all be on the same page.

Your prototype should consist of four servers (numbered 0 through 3) that communicate to each other. Each server should accept TCP connections from clients that emulate cell phones. A client should be able to send its location to the server by sending a message like this:

IAMAT 13105551212 +27.5916+086.5640


The first field IAMAT is name of the command where the client tells the server where it is. Its operands are the telephone number including the country code (in this case, +1 310 555 1212), and the latitude and longitude in decimal degrees using ISO 6709 notation. A phone number may be any string of non-space characters. Fields are separated by a single space and do not contain spaces.

The server should respond to clients with a message like this:

AT 2 1195285372.563873386 13105551212 +27.5916+086.5640


where AT is the name of the response, 2 is the number of the server that got the message from the client, 1195285372.563873386 is the server's idea of when it got the message from the client, expressed in Unix time (seconds and nanoseconds since 1970-01-01 00:00:00 UTC, ignoring leap seconds), and the remaining fields are a copy of the IAMAT data.

Clients can also query for other clients' locations, with a query like this:

WHEREIS 13105551212


The server responds with a AT message in the same format as before, giving the most recent location reported by the client, along with the server that it talked to and the time the server did the talking.

Servers should respond to invalid commands with a line that starts with a question mark (?) and contains any other data that you think useful.

Servers communicate to each other too, using AT messages or some variant of your design.

Each server should log its input and output into a file, using a format of your design. You can use the logs' data in your reports.

Write a report that summarizes your research, recommends whether Twisted is a suitable framework for this kind of application, and justifies your recommendation.

Prepare and give a brief oral presentation of your research, and present it in discussion section. Please make sure your talk fits in the time allotted, as we don't want to cut you off abruptly.

Your research, report, and presentation should focus on language-related issues. For example, how easy is it to write Twisted-based programs that run and exploit application server herds? What are the performance implications of using Twisted? Don't worry about nontechnical issues like licensing, or about management issues like software support and retraining programmers.

Style issues

Your report and talk are an important part of this assignment: they are not mere afterthoughts. Please see Resources for oral presentations and written reports for advice on generating high-quality reports and presentations.

Your report should use standard technical academic style, and should have a title, abstract, introduction, body, recommendations/conclusions, references, and any other sections necessary. Limit your report to at most five pages. Use the USENIX '07 style, which uses a two-column format with 10-point font for most of the text, on an 8½"×11" page; an example of the output format and an example student paper are available.

Your paper is not expected to be just like that example student paper! That was written by a graduate student, she was writing a conference paper describing months of full-time research, and the paper is too long for us. It's merely an example of technical style and layout.

Research mechanics

If you need to run servers on SEASnet to do your research, please let the TA know how many TCP ports you need, and we will allocate them for you. Please do not use TCP ports at random, as that might collide with other students' uses.

You can grab a copy of the Twisted source code from the Twisted web site. You can also find a copy of the Twisted source code in the directory ~eggert/src/Twisted-2.5.0 on SEASnet; a distribution compressed tarball is in the file /u/cs/fac/eggert/src/tarpit/Twisted-2.5.0.tar.bz2.

You can run Twisted on SEASnet by prepending /usr/local/cs/bin to your PATH in the usual way. For example, you can run the chat server (taken from the Twisted code examples) as follows:


# Replace '12000' with your TCP port number as allocated by the TA.
sed 's/1025/12000/' /u/cs/fac/eggert/src/Twisted-2.5.0/TwistedCore-2.5.0/doc/examples/chatserver.py >chatserver.py

# We recommend the -n option to forestall runaway servers.
twistd -n -y chatserver.py

# Now you can telnet to port 12000, from another session,
# to exercise the chat server.  A log should appear on
# your screen.

# Type control-C to terminate the chat server.

Submitting your work

Submit a file named report.pdf containing a copy of your paper in PDF form.

Submit a file named talk.pdf containing a copy of your presentation's visual aids in PDF form. Also submit a file containing the original format of the aids (talk.odp, talk.ppt, etc.), so that we can have them ready to go for your talk all using the same laptop, and not have to waste time connecting your laptop to the projector. Please consult with the TA well before the talk if your visual aids require special software (e.g., if you have a prototype server that you wish to demonstrate).

Submit any other supporting work (e.g., your source code) in a gzipped tar file named project.tgz.

Reference

Abe Fettig, Twisted network programming essentials, O'Reilly (October 2005), ISBN 0-596-10032-9. This is a Safari online book, so its contents should be viewable by anybody on the UCLA campus.


© 2006, 2007 Paul Eggert. See copying rules.
$Id: pr.html,v 1.19 2007/11/27 23:48:02 eggert Exp $