An abundance of data in science, engineering, national security, and health care has led to the emerging field of big data analytics. The current big data computing model lacks the kinds of debugging features found in traditional desktop computing, forcing data scientists to debug by trial and error. To address this challenge, we designed interactive debugging, data provenance, delta debugging, taint analysis, flow analysis, symbolic execution, and fuzz testing for Apache Spark.
|Specialized hardware accelerators like GPUs and FPGAs
become a prominent part of the current computing landscape.
However, developing heterogeneous applications is limited to
a small subset of programmers with specialized hardware
knowledge. To democratize heterogeneous computing,
our goal is to design new waves of refactoring, testing,
and debugging tools for heterogeneous application
| Modern software is bloated. Demand for new functionality
has led developers to include more and more features, many
of which become unneeded or unused. This phenomenon, known
as software bloat, results in software consuming more
resources and an unnecessary increase in attack surfaces.
To this end, we developed an end-to-end bytecode debloating framework called JDebloat. It augments traditional static reachability analysis with dynamic profiling, and it accounts for new dynamic language features in modern Java. This work is motivated and sponsored by Office of Naval Research Total Protection Cyber Platform program and has made a tech transfer impact to Navy. Information on debloating can be found here.
This quantification and sub-categorization of data scientists is important---although many companies are hiring data scientists and universities are creating new graduate programs, we lack scientific understandings of who data scientists are.
There is a growing interest in leveraging large collections of open-source repositories such as GitHub. Currently, it is difficult for a user to understand the commonalities and variances among a massive number of related code examples.
To tackle the new frontier of mining software repositories research, we design ultra-scale API usage mining, interactive visualization, code search, and recommendation.
|| Code duplication created by copy and paste is common in
large software and changing software often requires
systematic edits---similar but not identical
enhancements, refactorings, and bug fixes to many similar
methods. We developed novel example-based program
transformation, clone removal, differential
testing, and code review.
Refactoring is a technique that is used for cleaning up legacy code for bug fixes or feature additions. To create a scientific foundation on refactoring, we quantified the impact of a multi-year Windows re-architecting effort---we analyzed version history data, conducted a survey of over 300 developers, and interviewed the architects and development leads to assess the impact of refactoring on size, churn, complexity, test coverage, failure, and organization metrics.
| We invented a suite
of analysis tools that can help programmers investigate code
modifications. We also developed RefFinder, a logic-query
approach to refactoring reconstruction. Our insight was that
the skeleton of refactoring edits can be expressed as a