BIROn - Birkbeck Institutional Research Online

    A syntax-guided multi-task learning approach for Turducken-style code generation

    Yang, G. and Zhou, Y. and Chen, X. and Zhang, X. and Xu, Y. and Han, Tingting and Chen, Taolue (2023) A syntax-guided multi-task learning approach for Turducken-style code generation. Empirical Software Engineering 28 , ISSN 1382-3256.

    [img] Text
    2303.05061v2.pdf - Author's Accepted Manuscript
    Restricted to Repository staff only until 14 October 2024.

    Download (7MB) | Request a copy

    Abstract

    Due to the development of pre-trained language models, automated code generation techniques have shown great promise in recent years. However, the generated code will not always adhere to syntactic constraints of the target language, especially in the case of Turducken-style code, where declarative code snippets are embedded within imperative programs. In this study, we summarize three significant challenges in regards to syntactic constraints: (1) the efficient representation of syntactic constraints, (2) the effective integration of syntactic information, and (3) the scalable syntax-first decoding algorithm. To address these challenges, we propose a syntax-guided multi-task learning approach TurduckenGen. Specifically, we first explicitly append the type information to the code tokens to capture the representation of syntactic constraints. Then we formalize code generation with syntactic constraint representation as an auxiliary task to enable the model to learn the syntactic constraints of the code. Finally, the syntactically correct code is selected accurately from the multiple candidates with the help of the compiler feedback. Extensive experiments and comprehensive analysis demonstrate the effectiveness and general applicability of our approach after being compared with six state-of-the-art baselines on two Turducken-style code datasets. Finally, we conducted a human study and found the code quality generated by our approach is better than baselines in terms of code readability and semantic similarity.

    Metadata

    Item Type: Article
    Keyword(s) / Subject(s): Syntactically-constrained code generation, Turducken-style code, Multi-task learning, CodeT5, Abstract syntax tree
    School: Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences
    Depositing User: Tingting Han
    Date Deposited: 30 Oct 2023 14:40
    Last Modified: 31 Oct 2023 09:56
    URI: https://eprints.bbk.ac.uk/id/eprint/52319

    Statistics

    Activity Overview
    6 month trend
    3Downloads
    6 month trend
    44Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item Edit/View Item