A Transformer-based Approach for Source Code Summarization

Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang, in ACL (short), 2020.

Slides Code

Download the full text

Abstract

Generating a readable summary that describes the functionality of a program is known as source code summarization. In this task, learning code representation by modeling the pairwise relationship between code tokens to capture their long-range dependencies is crucial. To learn code representation for summarization, we explore the Transformer model that uses a self-attention mechanism and has shown to be effective in capturing long-range dependencies. In this work, we show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin. We perform extensive analysis and ablation studies that reveal several important findings, e.g., the absolute encoding of source code tokens’ position hinders, while relative encoding significantly improves the summarization performance. We have made our code publicly available to facilitate future research.

Bib Entry

@inproceedings{ahmad2020transformer,
  author = {Ahmad, Wasi and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
  title = {A Transformer-based Approach for Source Code Summarization},
  booktitle = {ACL (short)},
  year = {2020},
  presentation_id = {https://virtual.acl2020.org/paper_main.449.html}
}

Related Publications

Context Attentive Document Ranking and Query Suggestion, SIGIR, 2019
Multifaceted Protein-Protein Interaction Prediction Based on Siamese Residual RCNN, ISMB, 2019
Multi-Task Learning for Document Ranking and Query Suggestion, ICLR, 2018
Intent-aware Query Obfuscation for Privacy Protection in Personalized Web Search, SIGIR, 2018
Counterexamples for Robotic Planning Explained in Structured Language, ICRA, 2018
Word and sentence embedding tools to measure semantic similarity of Gene Ontology terms by their definitions, Journal of Computational Biology, 2018