BIROn - Birkbeck Institutional Research Online

    A syntax-guided multi-task learning approach for Turducken-style code generation

    Yang, G. and Zhou, Y. and Chen, X. and Zhang, X. and Xu, Y. and Han, Tingting and Chen, Taolue (2023) A syntax-guided multi-task learning approach for Turducken-style code generation. Empirical Software Engineering 28 , ISSN 1382-3256.

    [img]
    Preview
    Text
    2303.05061v2.pdf - Author's Accepted Manuscript

    Download (7MB) | Preview

    Abstract

    Due to the development of pre-trained language models, automated code generation techniques have shown great promise in recent years. However, the generated code will not always adhere to syntactic constraints of the target language, especially in the case of Turducken-style code, where declarative code snippets are embedded within imperative programs. In this study, we summarize three significant challenges in regards to syntactic constraints: (1) the efficient representation of syntactic constraints, (2) the effective integration of syntactic information, and (3) the scalable syntax-first decoding algorithm. To address these challenges, we propose a syntax-guided multi-task learning approach TurduckenGen. Specifically, we first explicitly append the type information to the code tokens to capture the representation of syntactic constraints. Then we formalize code generation with syntactic constraint representation as an auxiliary task to enable the model to learn the syntactic constraints of the code. Finally, the syntactically correct code is selected accurately from the multiple candidates with the help of the compiler feedback. Extensive experiments and comprehensive analysis demonstrate the effectiveness and general applicability of our approach after being compared with six state-of-the-art baselines on two Turducken-style code datasets. Finally, we conducted a human study and found the code quality generated by our approach is better than baselines in terms of code readability and semantic similarity.

    Metadata

    Item Type: Article
    Keyword(s) / Subject(s): Syntactically-constrained code generation, Turducken-style code, Multi-task learning, CodeT5, Abstract syntax tree
    School: Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences
    Depositing User: Tingting Han
    Date Deposited: 30 Oct 2023 14:40
    Last Modified: 14 Oct 2024 00:10
    URI: https://eprints.bbk.ac.uk/id/eprint/52319

    Statistics

    Activity Overview
    6 month trend
    11Downloads
    6 month trend
    124Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item
    Edit/View Item