Interactive Code Review for Systematic Changes

Developers often need to examine program changes during code reviews. However, it is difficult for developers to inspect systematic changes—similar, related changes that are scattered across multiple files. Developers cannot easily answer questions such as “what other code locations changed similar to this change?” and “are there any other locations that are similar to this code but are not updated?” Critics assists developers by (1) allows developers to customize a context-aware change template, (2) searches for systematic changes using the template, and (3) detects missing or inconsistent edits. Developers can interactively refine the customized change template to see corresponding search results. Critics has potential to improve developer productivity in inspecting large, scattered edits during code reviews. Critics is instantiated as Eclipse plug-in and is currently maintained by Tianyi Zhang and Myoungkyu Song.[pdf][demo][website]

Interactive Code Review for Tangled Program Changes

Developers often package changes from multiple programming tasks to a single code review, resulting in large and loosely related changes. Such changes are usually interleaved with each other, making it hard to understand during cod review. During the summer internship in 2016, I worked with my mentors at MSR and the TSE group to build a customized CodeFlow client to allow developers to untangle, manage, and annotate program changes for the ease of change comprehension during peer code reviews. The CodeFlow client has been instrumented to gather telemetry data and has been deployed internally at Microsoft. The project is still ongoing and we'd like to see whether developers have less jumping-back-and-forth when understanding a program change and whether they spend less time on code review tasks with the assistance of our customized CodeFlow client.

Automated Test Reuse and Differential Testing for Code Clones

Code reuse via copying and pasting is commonly practiced in software development. However, developers often find it difficult to check for behavioral consistency of the reused code due to a lack of test cases. In this project, we build the first test reuse and differential testing framework that reuses test cases from the original program and cross-checks runtime behaviors of reused code via differential testing. The test reuse problem is challenging due to variations of program source structures and calling contexts. Hereby we introduce a novel code transplantation technique to identify such variations and adapt the code for test reuse. We have evaluated this technique on 52 pairs of similar code (i.e., clones) in real-world projects and demonstrated its usefulness by detecting 2X more seeded faults compared with a static cloning bug finder.

Mining Massive Software Corpora for Code Reuse

GitHub has accumulated millions of open-source projects and the number of projects is still counting. The tremendous amount of code enables many potential code reuse scenarios that was not possible before. Prior research has been focusing on mining software repositories to retrieve API usage examples for functionality reuse. However, open-source code repositories capture not only functional code but also valuable assets such as test cases, specifications (often as annotations), and bug fixes. The goal of this project is to exploit large-scale code repositories to support code reuse in various domains, e.g., automated test generation, specification inference, bug detection, and automated repair. This project will investigate methods to (1) support various query modalities to accommodate different code reuse scenarios and (2) automatically generalize retrieved assets to reusable templates that can be applied to a target program.

Miscellenous Projects

  • APIForce: Interactively Documenting and Testing Rest APIs. REST is a great mechanism for content delivery in web services and has served well for over two decades. However, developers often find it hard to understand and debug REST APIs, since REST APIs can be very complex and developers have to look at multiple different locations to piece together what's going on in a single transaction. To facilitate developers better understand and use REST APIs, I built a framework to automatically aggregate all supported REST APIs in Spring MVC applications in a web portal and to allow people interactively explore and test REST APIs. [presentation]
  • Automated Detection of Fault-Inducing Changes Using Bytecode Analysis. Developers often modify existing software to add new features, fixing bugs, and do code refactoring. However, small changes can have colossal, unexpected non-local effects. This is especially conspicuous in object-oriented programming due to the extensive use of dynamic dispatching and subtyping. Although regression testing can expose unexpected behavior in a modified program, it still takes a significant amount of time and effort to manually debug and identify the root causes for failed tests. In this project, we build an Eclipse plugin, ChangeDebugger that automatically detects the responsible edits by analyzing the semantic impact of a set of program changes at the bytecode level. [pdf] [presentation]
  • Analyzing Performance Differences between Multiple Code Revisions. Optimizing program performance has always been an important goal due to the proliferation of mobile and distributed applications with limited processing and memory resources. Nevertheless, software developers are often ignorant of impact of their code modifications on program performance. Diff utilities are a well-established method for reviewing code changes and identifying bugs, but they only provide information on syntactic changes. In this project, we build a tool, PerfDiff that enables developers to quickly identify performance changes at the function-level granularity between two versions of a program. PerfDiff is integrated to git using a pre-commit hook. Everytime a new revision is committed, PerfDiff is triggered to measure the performance of the new revision using an existing profiler gprof. By analyzing the program changes and comparing the performance between two revisions, PerfDiff is capable of detecting the edit causing the performance regression. [pdf] [presentation]
  • TwitterTrends: Monitoring and Visualizing Hot Topics on Twitter. As one of the mainstream social platforms, Twitter is recognized by its capability to quickly post, share and disseminate information across the Internet. However, traditional information retrieval and data mining techniques cannot help people quickly and thoroughly grasp hot events on Twitter due to its unique text features and propagation mode. In this project, I designed a new approach, TwitterTrends to monitor hot topics on Twitter. Compared with traditional topic detecting and monitoring systems, it models and visualizes tweets in a user-interaction network that delivers more thorough information like active users and opinion leaders in a topic. Besides, it also monitors the change of topic popularity over time by analyzing the graph density and network centrality in user-interaction networks.