Training deep code comment generation models via data augmentation

Zhang, X. and Zhou, Y. and Han, Tingting and Chen, Taolue (2021) Training deep code comment generation models via data augmentation. In: Internetware '20: 12th Asia-Pacific Symposium on Internetware, 12-14 May 2020, Singapore.

Preview

Text
internetware.pdf
Download (537kB) | Preview

Official URL: https://doi.org/10.1145/3457913.3457937

Abstract

With the development of deep neural networks (DNNs) and the publicly available source code repositories, deep code comment generation models have demonstrated reasonable performance on test datasets. However, it has been confirmed in computer vision (CV) and natural language processing (NLP) that DNNs are vulner- able to adversarial examples. In this paper, we investigate how to maintain the performance of the models against these perturbed samples. We propose a simple, but effective, method to improve the robustness by training the model via data augmentation. We conduct experiments to evaluate our approach on two mainstream sequence-sequence (seq2seq) architectures which are based on the LSTM and the Transformer with a large-scale publicly available dataset. The experimental results demonstrate that our method can efficiently improve the capability of different models to defend the perturbed samples.

Metadata

Item Type:	Conference or Workshop Item (Paper)
School:	Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences
Depositing User:	Tingting Han
Date Deposited:	21 Mar 2023 16:23
Last Modified:	07 May 2025 16:57
URI:	https://eprints.bbk.ac.uk/id/eprint/44291

Statistics

DownloadsShow export options

Activity Overview

6 month trend

250Downloads

6 month trend

164Hits

Additional statistics are available via IRStats2.

Archive Staff Only (login required)

Edit/View Item