About
Despite the employment of faster CPUs and
larger memory systems, the levels of inefficiencies in real-world programs grow
surprisingly fast and there is an ever-increasing demand for performance
optimization in modern software. Performance and scalability issues are
becoming increasingly critical partly due to the pervasive use of
object-oriented programming languages. The inefficiencies inherent in the
implementation of an object-oriented language as well as the commonly adopted
design and implementation principles in the object-oriented community often
combine to hurt performance. The community-wide recognition of the importance
of abstraction and reuse results in increased emphasis on modular
design, declaration of general interfaces, and use of models and patterns. Programmers
are taught to focus first and foremost on them, taking it for granted that
compilers and run-time systems can remove all the inefficiencies. In a large
program that is typically built on top of many layers of frameworks and
libraries, a small set of inefficiencies can multiply and quickly get magnified
to slow down the system. When the call stack grows to be deep, the usefulness
of the dataflow analyses in a dynamic compiler becomes limited and the
optimizer can no longer remove these inefficiencies. As a result, many
applications suffer from chronic run-time performance problems that
significantly affect scalability and performance. This is a serious problem for
real-world software systems used every day by thousands of businesses. The
pressing need for new optimization techniques can be especially seen as
object-orientation goes everywhere into systems of any size. The extensive use
of object-oriented languages in the development of memory-constrained
applications such as smartphone apps (e.g., Java used in Android and C# used in
Windows phones) and data-intensive systems (e.g., Hadoop, Giraph,
and Hyracks) introduces numerous research
challenges-- these systems have small memory space but large amounts of data to
process and inefficiencies in these systems can be significantly exacerbated.
The burden of reducing unnecessary work should not be only on the shoulder of
hardware designers, especially in the modern era when Moore's dividend becomes
less obvious. It strongly calls for higher-level performance optimization
techniques that can detect and remove inefficiencies for all categories of
object-oriented applications. Our group has an established record on
performance optimization for large-scale systems. Our recent efforts focus on
the following projects:
(1) Providing efficient infrastructures for detecting bloat
A recent piece of work focuses on the development of a tunable
object lifetime profiling technique, called Resurrector
[OOPSLA13]. Many existing
optimization techniques (such as object pooling and pretenuring)
require precise identification of object lifetimes. However, it is particularly
challenging to obtain object lifetimes both precisely and efficiently: precise
profiling techniques such as Merlin introduce several hundred times slowdown
even for small programs while efficient approximation techniques often
sacrifice precision and produce less useful lifetime information. Resurrector
solves the problem by exploring the middle ground between high precision and
high efficiency to find the precision-scalability sweetspot
for various optimization techniques. Resurrector's
design is motivated by an important observation that the scalability bottleneck
of a traditional OLP algorithm (such as Merlin) lies in the need to compute
transitive closures on the dead objects (e.g., Merlin's backward pass).
Resurrector improves efficiency by completely eliminating this need. Similarly
to Merlin, Resurrector first identifies the root dead objects whose reference
counts are zero. Instead of computing transitive closures from them,
Resurrector exploits object caching and reusing to find dead objects
(transitively reachable from the roots) that have non-zero reference counts.
Another infrastructure we have built is a runtime framework that
performs abstract dynamic slicing [PLDI10-a, TOSEM14] to identify performance
problems that manifest themselves in dataflow activities. Abstract dynamic slicing, a technique that applies
dynamic slicing over an abstract domain whose size is limited by bounds
independent of the runtime execution. This technique is embedded in the general
framework parameterized by the abstract domain. The output of this framework is
an abstract dependence graph that contains abstractions of instructions, rather
than their actual runtime instances. This new approach is motivated by the
observation that a client of dynamic slicing often needs to access only a small
portion of the complete execution trace collected by a regular slicing
algorithm and thus tremendous effort is wasted on collecting information not
used by the client. The runtime (space and time) overhead can be significantly
reduced if the slicing algorithm is client aware, that is, it understands what
information would be needed by its client and records only such information
during the execution. Abstract dynamic slicing makes this possible by asking
the analysis developer to provide an abstraction that specifies this knowledge.
(2) Improved performance debugging and testing
Performance
problems in a large-scale application are extremely difficult to find.
Traditional performance test oracles such as time/memory checks are
coarse-grained and subjective; as a result, performance bugs often escape to
production runs, hurting software reliability and user experience. We are in
the process of developing a general technique, called PerfBlower, that can amplify
the effects of a class of performance problems whose symptoms can be
described by logical statements over a history of heap updates as well as
provide precise diagnostic information. Amplification serves as an automated
test oracle because it increases memory consumption significantly for tests
that trigger performance problems while having a very small impact on bug-free
runs. As a result, developers can easily divide tests into successful and
failing runs, and focus their effort on failing tests. Please read our ECOOP'15 paper for details.
Caching/resuing opportunities can often be found in large-scale
applications. A big source of run-time performance problems in large-scale, object-oriented
applications is the frequent creation of data structures whose lifetimes are
disjoint, and whose shapes and data content are always the same. Constructing
these data structures and computing the same data values many times is
expensive; significant performance improvements can be achieved by reusing
their instances, shapes, and/or data values rather than reconstructing them. We
first classify caching/reusing opportunities into three categories: instance
reusability, shape reusability, and data reusability [OOPSLA12]. We next develop scalable
runtime techniques that can quickly detect these opportunities by exploiting
cooperative compiler and runtime system support. For example, work from [OOPSLA12] is a technique that
piggybacks on GC to find reuse opportunities while Cachetor
[FSE13] relies on heavyweight
dependence profiling to understand if data values are cacheable.
(3)
Adaptive selection of algorithms and data structures
An important source of runtime bloat is the
inefficient use of container implementations. Standard libraries of
object-oriented languages such as Java and C# contain collection frameworks
that provide with users, for each abstract data type (such as List),
many different implementations (such as ArrayList
and LinkedList),
each of which features a different design choice suitable for a specific
execution scenario. However, in real-world development, choosing the most
appropriate container implementation is challenging. As a result, developers
tend to keep using the implementations that are most general or well-known
(e.g., HashSet for
Set),
regardless of whether or not they fit the usage context. We develop a novel
container optimization technique, called CoCo, that
is able to (1) determine at run time, for each container instance (e.g., a LinkedList
object) used in the
program, whether or not there exists another container implementation (e.g., ArrayList) that is more suitable for the execution; and (2) automatically
and safely switch to this new container implementation (e.g., replace the old LinkedList
object with a new ArrayList
object online) for increased efficiency.While there
exists work (such as Chameleon and Brainy) that could identify Java collection
inefficiencies and report them to users for offline inspection,
none of these techniques can change implementations online. Details about CoCo can found in the ECOOP'13 paper. In collaboration with the
information system group, we are currently developing techniques that can
automatically select table joining algorithms for message-based Big Data
systems (such as graph processing systems). Details of this project will be
reported later.
(4)
Static and dynamic detection of Java memory leaks
In
managed languages such as Java and C#, developers do not need to worry about
memory correctness issues such as dangling pointers and double free errors. However,
it remains challenging to avoid leaks. A memory leak in a managed language is
caused by keeping unnecessary references to objects that are no longer used.
These objects cannot be reclaimed by the garbage collector (GC), often leading
to severe performance degradation and even program crashes. We have developed
both static and dynamic techniques for memory leak detection. In particular, we
propose LeakChaser [PLDI11], a
specification-based leak detector, that exploits user insight (expressed in the
form of liveness assertions) to narrow down causes of memory leaks. Another
attractive direction is to perform static leak detection because it does not
rely on any leak-triggering inputs, allowing compile-time tools to find leaks
before software is released. A long-standing issue that prevents practical
static memory leak detection for Java is that it can be very expensive to
statically determine object liveness in large applications. We present a
practical static leak detection technique, called LeakChecker
[CGO14] that bypasses this problem by
considering a common leak pattern. In many cases severe leaks occur in loops
where, in each iteration, some objects created by the
iteration are unnecessarily referenced by objects external to the loop. These
unnecessary references are never used in later loop iterations. Based on this
insight, we shift our focus from computing liveness, which is very difficult to
achieve precisely and efficiently for large programs, to the easier goal of
identifying objects that flow out of a loop but never flow back in.
Papers
o
PerfBlower: Quickly Detecting
Memory-Related Performance Problems via Amplification,
Lu Fang, Liang Dou, and Guoqing (Harry) Xu.
ECOOP'15:
European Conference on Object-Oriented Programming.
[Slides]
o
LeakChecker:
Practical Static Memory Leak Detection for Managed Languages,
Dacong Yan, Guoqing (Harry) Xu, Shengqian Yang,
and Atanas Rountev.
CGO'14:
International Conference on Code Generation and Optimization.
[Slides]
o
Scalable Runtime Bloat
Detection Using Abstract Dynamic Slicing,
Guoqing (Harry)
Xu, Nick
Mitchell, Matthew Arnold, Atanas Rountev, Edith Schonberg, and Gary Sevitsky.
TOSEM'14, ACM Transactions on Software Engineering and
Methodology.
o
Resurrector: A Tunable
Object Lifetime Profiling Technique for Optimizing Real-World Programs,
Guoqing (Harry)
Xu.
OOPSLA'13, ACM
SIGPLAN Conference on Object-Oriented Programming Systems, Language, and
Applications.
[Slides]
o
Cachetor: Detecting Cacheable Data to Remove Bloat,
Khanh Nguyen and
Guoqing (Harry) Xu.
FSE'13, ACM
SIGSOFT Symposium on the Foundations of Software Engineering.
[Slides]
o
CoCo: Sound and
Adaptive Replacement of Java Collections,
Guoqing (Harry)
Xu.
ECOOP'13, European
Conference on Object-Oriented Programming.
[Slides]
o
Finding Reusable
Data Structures,
Guoqing (Harry)
Xu.
OOPSLA'12, ACM
SIGPLAN Conference on Object-Oriented Programming Systems, Language, and
Applications.
[Slides]
o
Static Detection
of Loop-Invariant Data Structures,
Guoqing (Harry)
Xu, Dacong Yan, and Atanas Rountev.
ECOOP'12, European
Conference on Object-Oriented Programming.
[Slides]
o
LeakChaser: Helping Programmers Narrow Down Causes of
Memory Leaks,
Guoqing (Harry) Xu, Michael D. Bond, Feng Qin,
and Atanas Rountev,
PLDI'11, ACM
SIGPLAN Conference on Programming Language Design and Implementation
[Slides]
o
Finding
Low-Utility Data Structures,
Guoqing (Harry) Xu, Nick Mitchell, Matthew
Arnold, Atanas Rountev, Gary Sevitsky, and Edith
Schonberg,
PLDI'10, ACM
SIGPLAN Conference on Programming Language Design and Implementation
[Slides]
o
Detecting
Inefficiently-Used Containers to Avoid Bloat,
Guoqing (Harry) Xu and Atanas Rountev,
PLDI'10, ACM
SIGPLAN Conference on Programming Language Design and Implementation
[Slides]
o
Go with the Flow:
Profiling Copies To Find Runtime Bloat,
Guoqing (Harry) Xu, Matthew Arnold, Nick
Mitchell, Atanas Rountev, and Gary Sevitsky,
PLDI'09, ACM
SIGPLAN Conference on Programming Language Design and Implementation
[Slides]
People
o
Lu Fang
o
Harry Xu
Software
o
PerfBlower, a performance problem amplification tool
o
LeakChaser, a specification-based memory leak detector
for Java
o
Resurrector:
a tunable object lifetime profiling tool based on Jikes RVM
Support
This research is funded in part by NSF under grants CNS-1321179,
CCF-140982, and CNS-1613023, and by ONR under grants N00014-14-1-0549
and N00014-16-1-2913.