A syntax-guided multi-task learning approach for Turducken-style code generation
Yang, G. and Zhou, Y. and Chen, X. and Zhang, X. and Xu, Y. and Han, Tingting and Chen, Taolue (2023) A syntax-guided multi-task learning approach for Turducken-style code generation. Empirical Software Engineering 28 , ISSN 1382-3256.
Text
2303.05061v2.pdf - Author's Accepted Manuscript Restricted to Repository staff only until 14 October 2024. Download (7MB) | Request a copy |
Abstract
Due to the development of pre-trained language models, automated code generation techniques have shown great promise in recent years. However, the generated code will not always adhere to syntactic constraints of the target language, especially in the case of Turducken-style code, where declarative code snippets are embedded within imperative programs. In this study, we summarize three significant challenges in regards to syntactic constraints: (1) the efficient representation of syntactic constraints, (2) the effective integration of syntactic information, and (3) the scalable syntax-first decoding algorithm. To address these challenges, we propose a syntax-guided multi-task learning approach TurduckenGen. Specifically, we first explicitly append the type information to the code tokens to capture the representation of syntactic constraints. Then we formalize code generation with syntactic constraint representation as an auxiliary task to enable the model to learn the syntactic constraints of the code. Finally, the syntactically correct code is selected accurately from the multiple candidates with the help of the compiler feedback. Extensive experiments and comprehensive analysis demonstrate the effectiveness and general applicability of our approach after being compared with six state-of-the-art baselines on two Turducken-style code datasets. Finally, we conducted a human study and found the code quality generated by our approach is better than baselines in terms of code readability and semantic similarity.
Metadata
Item Type: | Article |
---|---|
Keyword(s) / Subject(s): | Syntactically-constrained code generation, Turducken-style code, Multi-task learning, CodeT5, Abstract syntax tree |
School: | Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences |
Depositing User: | Tingting Han |
Date Deposited: | 30 Oct 2023 14:40 |
Last Modified: | 31 Oct 2023 09:56 |
URI: | https://eprints.bbk.ac.uk/id/eprint/52319 |
Statistics
Additional statistics are available via IRStats2.