Do Neural Transformers Learn Human-Defined Concepts? An Extensive Study in Source Code Processing Domain

State-of-the-art neural networks build an internal model of the training data, tailored to a given classification task. The study of such a model is of interest, and therefore, research on explainable artificial intelligence (XAI) aims at investigating if, in the internal states of a network, it is possible to identify rules that associate data to their corresponding classification. This work moves toward XAI research on neural networks trained in the classification of source code snippets, in the specific domain of cybersecurity. In this context, typically, textual instances have firstly to be encoded with non-invertible transformation into numerical vectors to feed the models, and this limits the applicability of known XAI methods based on the differentiation of neural signals with respect to real valued instances. In this work, we start from the known TCAV method, designed to study the human understandable concepts that emerge in the internal layers of a neural network, and we adapt it to transformers architectures trained in solving source code classification problems. We first determine domain-specific concepts (e.g., the presence of given patterns in the source code), and for each concept, we train support vector classifiers to separate points in the vector activation spaces that represent input instances with the concept from those without the concept. Then, we study if the presence (or the absence) of such concepts affects the decision process of the neural network. Finally, we discuss about how our approach contributes to general XAI goals and we suggest specific applications in the source code analysis field.

(2022). Do Neural Transformers Learn Human-Defined Concepts? An Extensive Study in Source Code Processing Domain [journal article - articolo]. In ALGORITHMS. Retrieved from https://hdl.handle.net/10446/265012

Do Neural Transformers Learn Human-Defined Concepts? An Extensive Study in Source Code Processing Domain

Ferretti, Claudio;Saletta, Martina

2022-01-01

Abstract

State-of-the-art neural networks build an internal model of the training data, tailored to a given classification task. The study of such a model is of interest, and therefore, research on explainable artificial intelligence (XAI) aims at investigating if, in the internal states of a network, it is possible to identify rules that associate data to their corresponding classification. This work moves toward XAI research on neural networks trained in the classification of source code snippets, in the specific domain of cybersecurity. In this context, typically, textual instances have firstly to be encoded with non-invertible transformation into numerical vectors to feed the models, and this limits the applicability of known XAI methods based on the differentiation of neural signals with respect to real valued instances. In this work, we start from the known TCAV method, designed to study the human understandable concepts that emerge in the internal layers of a neural network, and we adapt it to transformers architectures trained in solving source code classification problems. We first determine domain-specific concepts (e.g., the presence of given patterns in the source code), and for each concept, we train support vector classifiers to separate points in the vector activation spaces that represent input instances with the concept from those without the concept. Then, we study if the presence (or the absence) of such concepts affects the decision process of the neural network. Finally, we discuss about how our approach contributes to general XAI goals and we suggest specific applications in the source code analysis field.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di articolo
	
				articolo
			
	Data di pubblicazione
	
				2022
			
	Rivista in ANCE
	
				ALGORITHMS
			
	Tutti gli autori
	
						Ferretti, Claudio; Saletta, Martina
					
	Citazione
	
				(2022). Do Neural Transformers Learn Human-Defined Concepts? An Extensive Study in Source Code Processing Domain  [journal article - articolo]. In ALGORITHMS. Retrieved from https://hdl.handle.net/10446/265012
			
	Nelle collezioni:
	
				1.1.01 Articoli/Saggi in rivista - Journal Articles/Essays

File allegato/i alla scheda:

File	Dimensione del file	Formato
algorithms-15-00449-v2.pdf accesso aperto Versione: publisher's version - versione editoriale Licenza: Creative commons Dimensione del file 719.93 kB Formato Adobe PDF Visualizza/Apri	719.93 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/265012

Citazioni

6

5

social impact