ASPLOS’17
Tutorial: "Systemized" Program Analyses – A "Big Data"
Perspective on Static Analysis Scalability
Overview
How to scale
sophisticated static analyses to large codebases has been a key challenge in the
program analysis research for decades. The inability of scaling is the major
factor that prevents analysis-based techniques (e.g., verification, model checking, and static bug detection) from being
widely adopted in industry. Program analysis researchers tackle the problem
typically by raising the level of abstractions, trading off analysis precision
for scalability. However, higher-level abstractions render analyses less
useful, and, even with new abstractions, most analyses still
cannot scale to modern systems, such as Hadoop, Spark, or Linux. We revisit the
scalability problem of program analysis from a different
perspective: now that efficient systems can be
built to process datasets as large as the whole internet, can we shift our
focus from playing precision-scalability tradeoffs
to leveraging decades of experience in the systems community to build scalable
Big Code analysis systems? In this half-day tutorial, we will present our
vision of "systemizing" program analyses as well as two systems we have built for
scaling static program analyses.
Outline
This tutorial has the following three components. The tutorial
slides will be available shortly.
·
"Big Data" thinking – we will present an overview of large-scale data
analytics systems, as well as their motivation and design. Through these
different systems, we will try to generalize a - what we call - "Big Data" thinking and apply it to
solve problems in computing that do not appear to be data-intensive. We will
use some examples to illustrate how to convert a problem with complex
computation into a "Big Data" problem that has simple computation but operates on a large
dataset.
·
Graspan: A Single-Machine Disk-based Graph System for Scaling
Sophisticated Static Analysis – we will present the design and implementation
of Graspan, and use it as an example to demonstrate
how to develop a Big Data system tailored for dynamic transitive closure
computation used widely in static analysis. Graspan
is a parallel system that is not afraid of memory blowup. It utilizes support
from fast SSDs when memory is not sufficient. Our experiments show that Graspan is much faster than any existing Datalog engine.
·
BigSAT – we will present the design and implementation of
Presenters
The presenters are all from UC Irvine. Harry Xu is an assistant professor of computer
science. He has broad interests in programming languages, systems, data
analytics, and computer architecture. Dr. Zhiqiang Zuo is a postdoc researcher
working with Xu at UCI while Kai Wang and Aftab
Hussain are both Xu’s Ph.D. students.
Scope and Objectives
We
will cover necessary background on program analysis and data analytics. Our objectives are two-fold: (1) the tutorial will help
program analysis researchers think of alternative ways to scale their analysis
algorithms other than playing tradeoffs between precision and scalability; and
(2) it will help promote the future development of more backend systems to support various kinds of program analysis
workloads. We welcome all interested researchers and practitioners who use
static program analysis in their development or design program analysis
algorithms themselves. Systems researchers looking for new application domains
may also find it interesting.
Tentative Schedule
(Slides)
1:30-2:00 |
Overview of
large-scale data analytics systems and program analysis workloads |
2:00-2:30 |
Graspan – Part I |
2:30-3:00 |
Coffee Break |
3:30- 4:00 |
Graspan – Part II |
4:00-5:00 |
Overview of SAT
solving |
Downloads
The Graspan paper will be presented in ASPLOS’17. A preliminary
version is available here.
We have been working on Graspan since
August 2015. We initially built a Java
version that had some performance issues for very large graphs. During spring
and summer of 2016, we had two wonderful undergraduate students John Thorpe
(UCI) and Sungsoo Son (Kookmin
University, Korea) who helped us re-implement Graspan
in C++. Graspan is publicly available at https://github.com/Graspan.
The BigSAT project is still under
development. We will release the paper and system soon.