Research Publications
Teaching Student Funding Service

Research Projects 

The mission of Software Evolution and Analysis Laboratory is to improve developer productivity.

Automated Debugging and Testing for Big Data Analytics

An abundance of data in science, engineering, national security, and health care has led to the emerging field of big data analytics. The current big data computing model lacks the kinds of debugging features found in traditional desktop computing, forcing data scientists to debug by trial and error. To address this challenge, we designed interactive debugging, data provenance, delta debugging, taint analysis, flow analysis, symbolic execution, and fuzz testing for Apache Spark.

Software Developer Tools for Heterogeneous Computing Applications

Specialized hardware accelerators like GPUs and FPGAs become a prominent part of the current computing landscape. However, developing heterogeneous applications is limited to a small subset of programmers with specialized hardware knowledge. To democratize heterogeneous computing, our goal is to design new waves of refactoring, testing, and debugging tools for heterogeneous application development.

Java Bytecode Debloating for Size Reduction and Security 

Modern software is bloated. Demand for new functionality has led developers to include more and more features, many of which become unneeded or unused. This phenomenon, known as software bloat, results in software consuming more resources and an unnecessary increase in attack surfaces.

To this end, we developed an end-to-end bytecode debloating framework called JDebloat. It augments traditional static reachability analysis with dynamic profiling, and it accounts for new dynamic language features in modern Java. This work is motivated and sponsored by Office of Naval Research Total Protection Cyber Platform program and has made a tech transfer impact to Navy. Information on debloating can be found here.

Data Scientists in Software Teams: Backgrounds, Activities, Tools, Challenges and Best Practices 

data scientists research.pdf I initiated academia and industry coalition to investigate the emerging role of data scientists. We conducted an in-depth study on the emerging roles of data scientists, and we conducted a large scale survey with 793 professional data scientists.
This quantification and sub-categorization of data scientists is important---although many companies are hiring data scientists and universities are creating new graduate programs, we lack scientific understandings of who data scientists are.
  • The Emerging Roles of Data Scientists ICSE 2016
  • A Large Scale Survey with 793 Data Scientists TSE 2018

Mining, Assessing, and Visualizing Code Examples at Internet Scale

data scientists  research.pdf
There is a growing interest in leveraging large collections of open-source repositories such as GitHub. Currently, it is difficult for a user to understand the commonalities and variances among a massive number of related code examples.
To tackle the new frontier of mining software repositories research, we design ultra-scale API usage mining, interactive visualization, code search, and recommendation.

Code Clone Detection, Management, and Removal 

Code duplication created by copy and paste is common in large software and changing software often requires systematic edits---similar but not identical enhancements, refactorings, and bug fixes to many similar methods.  We developed novel example-based program transformation, clone removal, differential testing, and code review.
  • Clone Transplantation and Differential Testing ICSE 2017
  • Interactive Clone Search, ICSE 2015
  • Learning Transformation from Multiple Examples ICSE 2013
  • Generating Transformation from a Single Example PLDI 2011
The following techniques find copy and paste bugs and reconstructs clone evolution.

Refactoring Automation, Inspection, Testing, and Studies 


Refactoring is a technique that is used for cleaning up legacy code for bug fixes or feature additions. To create a scientific foundation on refactoring, we quantified the impact of a multi-year Windows re-architecting effort---we analyzed version history data, conducted a survey of over 300 developers, and interviewed the architects and development leads to assess the impact of refactoring on size, churn, complexity, test coverage, failure, and organization metrics.

  • A Field Study of Refactoring at Microsoft FSE 2012, TSE 2014
  • API Refactoring and Bug Fixes ICSE 2011, Nominated for ACM SIGSOFT Distinguished Paper Award
  • API Stability and Adoption ICSM 2013
The following techniques find refactoring bugs

Logical Program Differencing

CHIME We invented a suite of analysis tools that can help programmers investigate code modifications. We also developed RefFinder, a logic-query approach to refactoring reconstruction. Our insight was that the skeleton of refactoring edits can be expressed as a logical constraint.