The following papers all fall loosely under the rubric of applied algorithms in several application areas (computer architecture, operating systems, bioinformatics, networking, simulation, web economics).

Computer Architecture

Most of this work was done because I was fortunate to work with a wonderful colleague, Brad Calder, who was open to new ideas. One of the highlights was an ISCA paper on a fairly radical network processor design whose lead author was Tim Sherwood, now on the faculty at UCSB.

A Pipelined Memory Architecture for High Throughput Network Processors, A Proposal for Building 40 Gbps Network Processors using an innovative memory design, Proceedings of the ACM International Symposium on Computer Architecture (ISCA), San Diego, California, June 2003.

Catching Accurate Profiles in Hardware, Extending some of our measurement ideas to hardware profiling for processors, Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), February 2003. Satish was the lead author and is now an architecture professor at Michigan.

Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, and George Varghese, Proceedings of the IEEE Infocom Conference 2004. Although a network security paper, this is a very architecture "take" on the problem. All the students and Brad are computer architects. Micro invades Infocom!

Hardware and Binary Modification Support for Code Pointer Protection From Buffer Overflow Nathan Tuck, Brad Calder, and George Varghese, Proceedings of the 37th Micro, Micro 2004. An architectural response to the problem of worms that were so prevalent in 2004!

Parallelism versus Memory Allocation in Pipelined Router Forwarding Engines, Fan R. K. Chung, Ronald L. Graham, Jia Mao, George Varghese, Theory Comput. Syst. 39(6): 829-849 2006. This paper was inspired by the remarkable 2-port memory based design of John Holst that was used in the Procket Router. Ron and Fan saw it as a new type of bin packing problem. It originally appeared in SPAA 2004 because of the interaction between parallelism and memory allocation in multiprocessor systems.

Sketches and Streaming

My students and I have been fortunate to work with terrific theory folks like Mike Mitzenmacher, Subash Suri, David Eppstein, and Mike Goodrich. Some other sketches are described in the section on router analytics.

New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice, Cristian Estan,George Varghese ACM Transactions Computer Systems 2003. This paper started my interest in sketches. It appeared slightly earlier than Muthu and Cormode's Count-Min Sketch and actually solves a different problem though it uses the same underlying algorithm and is often confused with Count-Min. Multistage filters accurately estimate the bandwidth of heavy-hitters only, while Count-Min provide a less accurate bound, but for any flow chosen after the fact. The theorems that characterize performance are very different.

An Improved Construction for Counting Bloom Filters, Flavio Bonomi, Michael Mitzenmacher, Rina Panigrahy, Sushil Singh, George Varghese, European Symposium on Algorithms (ESA), 2006. A side-effect of our efforts to generalize Bloom Filters to Bloom State Machines.

Bitmap algorithms for counting active flows on high-speed links, Cristian Estan, George Varghese, Michael E. Fisk, IEEE/ACM Trans. Netw. 14(5): 925-937, 2006. Althoug a networking paper, a way to count distinct tuples that is related to but different from Flajolet-Martin's classic scheme for databases.

What's the Difference? Efficient Set Difference without Prior Context , D. Eppstein, M. Goodrich, F. Uyeda, G. Varghese, SIGCOMM 2011. A sketch for set difference that appears to be optimal in computation and communication.

Biff (Bloom Filter) codes: Fast Error Correction for Large Data Sets , M. Mitzenmacher and George Varghese, in submission, a new error correcting code for large data sets that can correct thousands of errors in messages of 1 million words in under a second, much faster than Reed-Solomon and much simpler than Turbo and Raptor Codes. The sketch for set difference is set to work on error correction with a very simple reduction that has some nice consequences.

Compression

Tree bitmap: Hardware Software IP Lookups with Incremental Updates, W. Eatherton, Z. Dittia, and George Varghese, Best compressed IP lookup algorithm we know of, used in Cisco CRS-1, ACM CCR, 2004.

Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines, F. Bonomi, M. Mitzenmacher, R. Panigraphy, S. Singh, G. Varghese, a generalization of Bloom Filters to state machines, SIGCOMM 2006 A Bloom Filter is just the simplest kind of compressed state machine!

Harnessing Memory Redundancy in Virtual Machines. OSDI 2008: 309-322 , Diwaker Gupta, Sangmin Lee, Michael Vrable, Stefan Savage, Alex C. Snoeren, George Varghese, Geoffrey M. Voelker, Amin Vahdat (Award Paper, also invited to and appeared in CACM). Virtual Machine memories are often redundant because of copies of the OS are included in each VM. How do we compress across VMs? Things have changed since 2008 because of Address Space Randomization in modern VMs.

EndRE: An End-System Redundancy Elimination Service for Enterprises, B. Aggarwal, A. Akella, A. Anand, A. Balachandran, P. Chitis, C. Muthukrishnan, R. Ramjee, G. Varghese, NSDI 2010. Compressing content as in companies like Riverbed except done end-to-end and not in middleboxes.

SlimGene: Compressing Genomic Sequences using SLIMGENE, C. Kozanitis, C. Saunders, S. Krugylak, V, Bafna, G. Varghese, RECCOMB 2010. Genomes are huge, 250 Gbytes per person. We suggest that lossy compression can gain factors of 10 in disk storage.

Other Algorithms and Lower Bounds

"The Pessimism behind Optimistic Simulation" PADS 94 and Inf Proc. Letters 1995. A new Global Virtual Time (GVT) algorithm for distributed simulation derived from a termination detection protocol due to Jay Misra.

On the Difficulty of Scalably Detecting Network Attacks Kirill Levchenko, Ramamohan Paturi, and George Varghese, Proceedings of the ACM Conference on Computer and Communications Security, Washington, D.C., October 2004. Lower bounds on memory for some network security tasks.

A Uniform Projection Method for Motif Discovery in DNA Sequences, B. Raphael, L. Liu, and George Varghese, IEEE Transactions on Bioinformatics and Computational Biology, 2004. Suggests that deterministic projections based on combinatorial designs can do better than classical randomized projections.

A lower bound for multicast key distribution, Jack Snoeyink, Subhash Suri, George Varghese, Computer Networks 47(3): 429-441 2005.

Lattice games and the Economics of aggregators , P. Jordan, U. Nadav, K. Punera, A. Skrzypacz, G. Varghese, WWW 2012. Presentation at Stanford RAIN seminar . A way to model the entrance of aggregators such as Slashdot and a strategy for them that we call "Frontier Descent". My first taste of game theory: I was fortunate to work with Uri and Andy who helped me understand some of the subtleties of Extensive Form Games and Equilibria. We introduce a new solution concept called satisficing --- inspired by Simon's use of the word in decision theory --- that is less stringent than Nash Equilibria/best response.

Prepared by George Varghese
Last Modified Feb 2012 .