Research in source code summarization, that is the description of the functionality of a program with short sentences expressed in natural language, is a topic of great interest in the software engineering community, since it can help in automatically generating software documentation, and in general can ease the effort of the developers in understanding the code they are working on. In this work, which is conceived as a negative results paper, we study the existing neural models designed for this purpose, pointing out their high sensitivity to the natural elements present in the source code (i.e. comments and identifiers) and the related drop in performance when such elements are ablated or masked. We then propose a novel source code summarization approach based on the aid of an intermediate pseudo-language, through which we are able to fine-tune the BRIO model for natural language on source code summarization, and to achieve results comparable to that obtained by the state-of-the-art source code competitors (e.g. PLBART and CodeBERT). We finally discuss about the limitations of these NLP-based approaches when transferred in the domain of source code processing, and we provide some insights for further research directions.
(2023). Naturalness in Source Code Summarization. How Significant is it? . Retrieved from https://hdl.handle.net/10446/265011
Naturalness in Source Code Summarization. How Significant is it?
Saletta, Martina
2023-01-01
Abstract
Research in source code summarization, that is the description of the functionality of a program with short sentences expressed in natural language, is a topic of great interest in the software engineering community, since it can help in automatically generating software documentation, and in general can ease the effort of the developers in understanding the code they are working on. In this work, which is conceived as a negative results paper, we study the existing neural models designed for this purpose, pointing out their high sensitivity to the natural elements present in the source code (i.e. comments and identifiers) and the related drop in performance when such elements are ablated or masked. We then propose a novel source code summarization approach based on the aid of an intermediate pseudo-language, through which we are able to fine-tune the BRIO model for natural language on source code summarization, and to achieve results comparable to that obtained by the state-of-the-art source code competitors (e.g. PLBART and CodeBERT). We finally discuss about the limitations of these NLP-based approaches when transferred in the domain of source code processing, and we provide some insights for further research directions.File | Dimensione del file | Formato | |
---|---|---|---|
Saletta Conference IEEE.pdf
Solo gestori di archivio
Versione:
publisher's version - versione editoriale
Licenza:
Licenza default Aisberg
Dimensione del file
2.95 MB
Formato
Adobe PDF
|
2.95 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo