BIROn - Birkbeck Institutional Research Online

    Training deep code comment generation models via data augmentation

    Zhang, X. and Zhou, Y. and Han, Tingting and Chen, Taolue (2021) Training deep code comment generation models via data augmentation. In: Internetware '20: 12th Asia-Pacific Symposium on Internetware, 12-14 May 2020, Singapore.

    [img]
    Preview
    Text
    internetware.pdf

    Download (537kB) | Preview

    Abstract

    With the development of deep neural networks (DNNs) and the publicly available source code repositories, deep code comment generation models have demonstrated reasonable performance on test datasets. However, it has been confirmed in computer vision (CV) and natural language processing (NLP) that DNNs are vulner- able to adversarial examples. In this paper, we investigate how to maintain the performance of the models against these perturbed samples. We propose a simple, but effective, method to improve the robustness by training the model via data augmentation. We conduct experiments to evaluate our approach on two mainstream sequence-sequence (seq2seq) architectures which are based on the LSTM and the Transformer with a large-scale publicly available dataset. The experimental results demonstrate that our method can efficiently improve the capability of different models to defend the perturbed samples.

    Metadata

    Item Type: Conference or Workshop Item (Paper)
    School: Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences
    Depositing User: Tingting Han
    Date Deposited: 21 Mar 2023 16:23
    Last Modified: 09 Aug 2023 12:50
    URI: https://eprints.bbk.ac.uk/id/eprint/44291

    Statistics

    Activity Overview
    6 month trend
    224Downloads
    6 month trend
    152Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item
    Edit/View Item