Research in source code summarization, that is the description of the functionality of a program with short sentences expressed in natural language, is a topic of great interest in the software engineering community, since it can help in automatically generating software documentation, and in general can ease the effort of the developers in understanding the code they are working on. In this work, which is conceived as a negative results paper, we study the existing neural models designed for this purpose, pointing out their high sensitivity to the natural elements present in the source code (i.e. comments and identifiers) and the related drop in performance when such elements are ablated or masked. We then propose a novel source code summarization approach based on the aid of an intermediate pseudo-language, through which we are able to fine-tune the BRIO model for natural language on source code summarization, and to achieve results comparable to that obtained by the state-of-the-art source code competitors (e.g. PLBART and CodeBERT). We finally discuss about the limitations of these NLP-based approaches when transferred in the domain of source code processing, and we provide some insights for further research directions.

(2023). Naturalness in Source Code Summarization. How Significant is it? . Retrieved from https://hdl.handle.net/10446/265011

Naturalness in Source Code Summarization. How Significant is it?

Saletta, Martina
2023-01-01

Abstract

Research in source code summarization, that is the description of the functionality of a program with short sentences expressed in natural language, is a topic of great interest in the software engineering community, since it can help in automatically generating software documentation, and in general can ease the effort of the developers in understanding the code they are working on. In this work, which is conceived as a negative results paper, we study the existing neural models designed for this purpose, pointing out their high sensitivity to the natural elements present in the source code (i.e. comments and identifiers) and the related drop in performance when such elements are ablated or masked. We then propose a novel source code summarization approach based on the aid of an intermediate pseudo-language, through which we are able to fine-tune the BRIO model for natural language on source code summarization, and to achieve results comparable to that obtained by the state-of-the-art source code competitors (e.g. PLBART and CodeBERT). We finally discuss about the limitations of these NLP-based approaches when transferred in the domain of source code processing, and we provide some insights for further research directions.
2023
Ferretti, Claudio; Saletta, Martina
File allegato/i alla scheda:
File Dimensione del file Formato  
Saletta Conference IEEE.pdf

Solo gestori di archivio

Versione: publisher's version - versione editoriale
Licenza: Licenza default Aisberg
Dimensione del file 2.95 MB
Formato Adobe PDF
2.95 MB Adobe PDF   Visualizza/Apri
Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/265011
Citazioni
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 5
social impact