Most of this work was done because I was fortunate to work with a wonderful colleague, Brad Calder, who was open to new ideas. One of the highlights was an ISCA paper on a fairly radical network processor design whose lead author was Tim Sherwood, now on the faculty at UCSB.
A Pipelined Memory Architecture for High Throughput Network Processors, A Proposal for Building 40 Gbps Network Processors using an innovative memory design, Proceedings of the ACM International Symposium on Computer Architecture (ISCA), San Diego, California, June 2003.
Catching Accurate Profiles in Hardware, Extending some of our measurement ideas to hardware profiling for processors, Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), February 2003. Satish was the lead author and is now an architecture professor at Michigan.
Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, and George Varghese, Proceedings of the IEEE Infocom Conference 2004. Although a network security paper, this is a very architecture "take"
on the problem. All the students and Brad are computer architects. Micro invades
Infocom!
Hardware and Binary Modification Support for Code Pointer Protection From Buffer Overflow Nathan Tuck, Brad Calder, and George Varghese,
Proceedings of the 37th Micro, Micro 2004.
An architectural response to the problem of worms that were so prevalent
in 2004!
Parallelism versus Memory Allocation in Pipelined Router Forwarding
Engines, Fan R. K. Chung, Ronald L. Graham, Jia Mao, George Varghese,
Theory Comput. Syst. 39(6): 829-849 2006. This paper was inspired by the
remarkable 2-port memory based design of John Holst that was used in the Procket
Router. Ron and Fan saw it as a new type of bin packing problem.
It originally appeared in SPAA 2004 because of the interaction between parallelism
and memory allocation in multiprocessor systems.
My students and I have been fortunate to work with terrific theory folks
like Mike Mitzenmacher, Subash Suri, David Eppstein, and Mike Goodrich. Some
other sketches are described in the section on router analytics.
New directions in traffic measurement and
accounting: Focusing on the elephants, ignoring the mice,
Cristian Estan,George Varghese
ACM Transactions Computer Systems 2003. This paper started my interest in
sketches. It appeared slightly earlier than Muthu and Cormode's Count-Min Sketch
and actually solves a different problem though it uses the same underlying
algorithm and is often confused with Count-Min. Multistage filters accurately estimate the bandwidth of heavy-hitters only, while Count-Min provide a less accurate bound, but for any flow chosen after the fact. The theorems that characterize performance are very different.
An Improved Construction for Counting Bloom
Filters,
Flavio Bonomi, Michael Mitzenmacher, Rina Panigrahy, Sushil Singh,
George Varghese,
European Symposium on Algorithms (ESA),
2006. A side-effect of our efforts to generalize Bloom Filters to
Bloom State Machines.
Bitmap algorithms for counting active flows on
high-speed links,
Cristian Estan, George Varghese, Michael E. Fisk,
IEEE/ACM Trans. Netw. 14(5): 925-937,
2006. Althoug a networking paper, a way to count distinct tuples that
is related to but different from Flajolet-Martin's classic scheme for databases.
What's the Difference? Efficient Set Difference without Prior Context , D. Eppstein,
M. Goodrich, F. Uyeda, G. Varghese, SIGCOMM 2011. A sketch for set difference that
appears to be optimal in computation and communication.
Biff (Bloom Filter) codes: Fast Error Correction for Large Data Sets , M. Mitzenmacher
and George Varghese, in submission, a new error correcting code for large data sets
that can correct thousands of errors in messages of 1 million words in under a second, much
faster than Reed-Solomon and much simpler than Turbo and Raptor Codes. The sketch
for set difference is set to work on error correction with a very simple reduction
that has some nice consequences.
Tree bitmap: Hardware Software IP Lookups with Incremental Updates,
W. Eatherton, Z. Dittia, and George Varghese, Best compressed IP lookup algorithm
we know of, used in Cisco CRS-1, ACM CCR, 2004.
Beyond Bloom Filters: From Approximate Membership Checks to Approximate
State Machines, F. Bonomi, M. Mitzenmacher, R. Panigraphy, S. Singh, G.
Varghese, a generalization of Bloom Filters to state machines, SIGCOMM 2006
A Bloom Filter is just the simplest kind of compressed state machine!
Harnessing Memory Redundancy in Virtual Machines. OSDI 2008: 309-322
, Diwaker Gupta, Sangmin Lee, Michael Vrable, Stefan Savage, Alex C. Snoeren, George Varghese, Geoffrey M. Voelker, Amin Vahdat (Award Paper, also
invited to and appeared in CACM). Virtual Machine memories are often redundant
because of copies of the OS are included in each VM. How do we compress across
VMs? Things have changed since 2008 because of Address Space Randomization in
modern VMs.
EndRE: An End-System Redundancy Elimination Service for Enterprises, B.
Aggarwal, A. Akella, A. Anand, A. Balachandran,
P. Chitis, C. Muthukrishnan, R. Ramjee, G. Varghese, NSDI
2010. Compressing content as in companies like Riverbed except done end-to-end
and not in middleboxes.
SlimGene: Compressing Genomic Sequences using SLIMGENE, C. Kozanitis,
C. Saunders, S. Krugylak, V, Bafna, G. Varghese, RECCOMB
2010. Genomes are huge, 250 Gbytes per person. We suggest that lossy compression
can gain factors of 10 in disk storage.
"The
Pessimism behind Optimistic Simulation" PADS 94 and Inf Proc. Letters
1995. A new Global Virtual Time (GVT) algorithm for distributed simulation
derived from a termination detection protocol due to Jay Misra.
On the Difficulty of Scalably Detecting Network Attacks Kirill Levchenko, Ramamohan Paturi, and George Varghese, Proceedings of the ACM Conference on Computer and Communications Security, Washington, D.C., October 2004. Lower bounds on memory for some network security tasks.
A Uniform Projection Method for Motif Discovery in DNA Sequences,
B. Raphael, L. Liu, and George Varghese,
IEEE Transactions on
Bioinformatics and Computational Biology,
2004. Suggests that deterministic projections based on combinatorial designs
can do better than classical randomized projections.
A lower bound for multicast key distribution,
Jack Snoeyink, Subhash Suri, George Varghese,
Computer Networks 47(3): 429-441
2005.
Lattice games and the Economics of aggregators , P. Jordan, U. Nadav,
K. Punera, A. Skrzypacz, G. Varghese, WWW 2012.
Presentation at Stanford RAIN seminar . A way to model the entrance of aggregators
such as Slashdot and a strategy for them that we call "Frontier Descent". My
first taste of game theory: I was fortunate to work with Uri and Andy
who helped me understand some of the subtleties of Extensive Form Games and Equilibria.
We introduce a new solution concept called satisficing --- inspired by Simon's use
of the word in decision theory --- that is less stringent than Nash Equilibria/best
response.
Sketches and Streaming
Compression
Other Algorithms and Lower Bounds
Prepared by George Varghese
Last Modified Feb 2012 .