ASPLOS’17 Tutorial: "Systemized" Program Analyses – A "Big Data" Perspective on Static Analysis Scalability

 

Overview

How to scale sophisticated static analyses to large codebases has been a key challenge in the program analysis research for decades. The inability of scaling is the major factor that prevents analysis-based techniques (e.g., verification, model checking, and static bug detection) from being widely adopted in industry. Program analysis researchers tackle the problem typically by raising the level of abstractions, trading off analysis precision for scalability. However, higher-level abstractions render analyses less useful, and, even with new abstractions, most analyses still cannot scale to modern systems, such as Hadoop, Spark, or Linux. We revisit the scalability problem of program analysis from a different perspective: now that efficient systems can be built to process datasets as large as the whole internet, can we shift our focus from playing precision-scalability tradeoffs to leveraging decades of experience in the systems community to build scalable Big Code analysis systems? In this half-day tutorial, we will present our vision of "systemizing" program analyses as well as two systems we have built for scaling static program analyses.

Outline

This tutorial has the following three components. The tutorial slides will be available shortly.

·       "Big Data" thinking – we will present an overview of large-scale data analytics systems, as well as their motivation and design. Through these different systems, we will try to generalize a - what we call - "Big Data" thinking and apply it to solve problems in computing that do not appear to be data-intensive. We will use some examples to illustrate how to convert a problem with complex computation into a "Big Data" problem that has simple computation but operates on a large dataset.

·       Graspan: A Single-Machine Disk-based Graph System for Scaling Sophisticated Static Analysis – we will present the design and implementation of Graspan, and use it as an example to demonstrate how to develop a Big Data system tailored for dynamic transitive closure computation used widely in static analysis. Graspan is a parallel system that is not afraid of memory blowup. It utilizes support from fast SSDs when memory is not sufficient. Our experiments show that Graspan is much faster than any existing Datalog engine.

·      BigSAT – we will present the design and implementation of BigSAT, a distributed SAT solver based on the DP procedure (i.e., performs explicit resolutions). We will use it as another example to show how Big Data thinking can be applied in software analysis.

Presenters

The presenters are all from UC Irvine.  Harry Xu is an assistant professor of computer science. He has broad interests in programming languages, systems, data analytics, and computer architecture. Dr. Zhiqiang Zuo is a postdoc researcher working with Xu at UCI while Kai Wang and Aftab Hussain are both Xu’s Ph.D. students.

Scope and Objectives

We will cover necessary background on program analysis and data analytics. Our objectives are two-fold: (1) the tutorial will help program analysis researchers think of alternative ways to scale their analysis algorithms other than playing tradeoffs between precision and scalability; and (2) it will help promote the future development of more backend systems to support various kinds of program analysis workloads. We welcome all interested researchers and practitioners who use static program analysis in their development or design program analysis algorithms themselves. Systems researchers looking for new application domains may also find it interesting.

Tentative Schedule (Slides)

1:30-2:00

Overview of large-scale data analytics systems and program analysis workloads
"Big Data" thinking

2:00-2:30

Graspan – Part I

2:30-3:00

Coffee Break

3:30- 4:00

Graspan – Part II

4:00-5:00

Overview of SAT solving
BigSAT Design and Implementation

Downloads

The Graspan paper will be presented in ASPLOS’17. A preliminary version is available here.

 

We have been working on Graspan since August 2015.  We initially built a Java version that had some performance issues for very large graphs. During spring and summer of 2016, we had two wonderful undergraduate students John Thorpe (UCI) and Sungsoo Son (Kookmin University, Korea) who helped us re-implement Graspan in C++. Graspan is publicly available at https://github.com/Graspan.

 

The BigSAT project is still under development. We will release the paper and system soon.