Home |
Research | Publications |
Teaching | Student | Funding | Service |
Community |
Talks |
GitHub |
Our early,
large-scale study scientifically characterized the emerging role of data
scientists within software
teams, influencing new
data science majors at universities. Addressing the need
for data-intensive development, we established new
directions in software
engineering for data-intensive computing and heterogeneous
computing. Over the past decade, we've developed eleven data-intensive
developer tools, making traditional code-centric analysis viable in
complex data-intensive environments.
The rise of big
data analytics highlights
a critical debugging gap in current computing models,
forcing data
scientists into
trial-and-error. To address this gap, we've created data-intensive developer tools for
Apache Spark, enhancing reliability and efficiency by making code-centric
analysis such as interactive debugging, fuzz
testing, symbolic execution, and taint analysis viable
in data-intensive environments. This vision is summarized in ASE Keynote on "Re-engineering SE for data-centric world" and IEEE Software article on "SE4DA: Software Engineering for Data Analytics."
|
Data-intensive
computing thrives
on specialized hardware accelerators, but developing such
heterogeneous applications requires niche expertise. To
make it easier to leverage heterogeneous hardware, our research designs testing, debugging, and
repair tools for incorporating hardware accelerators. This vision for empowering broader development of heterogeneous applications is detailed in our ISSTA Keynote on "Software Tools for Democratizing Heterogeneous Computing" (video).
|
![]() |
![]() |
Leveraging vast open-source repositories like
GitHub, we address the challenge of understanding
commonalities in massive code collections. We design
ultra-scale API
usage mining, interactive
visualization, code
search, and recommendation
systems, all by actively leveraging code
similarity at scale.
|
![]() |
Code clones, stemming from copy-paste, are common in large
software, requiring systematic edits across similar methods.
We pioneered example-based
program transformation, which served as the basis for automated
patch generation and
influenced subsequent work. Our contributions also include
an automated
clone removal refactoring technique. |
![]() |
Refactoring restructures code to improve design without changing external behavior. Our work uses code similarity to identify refactoring and track impact. To establish a scientific foundation, we quantified a Windows re-architecting effort, analyzing version history and developer surveys to assess its impact on bugs, size, complexity, and other metrics. Our group developed techniques to identify refactorings from version histories and detect associated bugs.
|
![]() |
To address software bloat, which consumes resources
and expands attack surfaces, we developed the bytecode
debloating framework for software
simplification by augmenting
static analysis with dynamic profiling for modern Java,
enhancing security. Our work was motivated and sponsored by the Office of Naval Research, and we are one of only five teams selected for its technology transfer to the Navy. Information on debloating can be found here. |