A Fast Feature Selection for Interpretable Modeling Based on Fuzzy Inference Systems

Large datasets are often beneficial for the generation of predictive models using machine learning approaches. However, it is often the case that not all variables in the dataset contain useful information. In fact, some variables might be useless, redundant, misleading, or even harmful to performance, both in terms of accuracy and computational effort. Because of that, Feature Selection (FS) is one of the most delicate and important steps in machine learning. This is even more relevant in the case of interpretable models based on Fuzzy Inference Systems (FIS). The reasons are two-fold: on the one hand, FIS are generally built on top of a data partitioning based on clustering, which can suffer from high dimensionality; on the other hand, the knowledge base of the FIS, to be concretely understandable, should not contain rules involving too many variables. FS can be performed using multiple approaches, most notably filter and wrapper methods. The latter are often based on evolutionary algorithms, where a population of candidate solutions (each representing a possible set of selected variables) evolves towards the optimal selection. Although wrapper methods can be effective, they are, in general, computationally expensive. In this work, we propose a completely different - and more computationally effective - algorithm based on Random Forest (RF) models. Specifically, we exploit RFs to rank variables according to their importance. Then, we use that information to perform a statistical analysis and determine the minimal set of features necessary to build an accurate FIS. We show the effectiveness of our approach by using two (semi)synthetic datasets built on real-world datasets, and we validate our approach by applying the FS method to a medical dataset.

(2024). A Fast Feature Selection for Interpretable Modeling Based on Fuzzy Inference Systems . Retrieved from https://hdl.handle.net/10446/297725

A Fast Feature Selection for Interpretable Modeling Based on Fuzzy Inference Systems

Tangherloni, A.;Cazzaniga, Paolo;Stranieri, N.;Buffa, F. M.;Nobile, M. S.

2024-01-01

Abstract

Large datasets are often beneficial for the generation of predictive models using machine learning approaches. However, it is often the case that not all variables in the dataset contain useful information. In fact, some variables might be useless, redundant, misleading, or even harmful to performance, both in terms of accuracy and computational effort. Because of that, Feature Selection (FS) is one of the most delicate and important steps in machine learning. This is even more relevant in the case of interpretable models based on Fuzzy Inference Systems (FIS). The reasons are two-fold: on the one hand, FIS are generally built on top of a data partitioning based on clustering, which can suffer from high dimensionality; on the other hand, the knowledge base of the FIS, to be concretely understandable, should not contain rules involving too many variables. FS can be performed using multiple approaches, most notably filter and wrapper methods. The latter are often based on evolutionary algorithms, where a population of candidate solutions (each representing a possible set of selected variables) evolves towards the optimal selection. Although wrapper methods can be effective, they are, in general, computationally expensive. In this work, we propose a completely different - and more computationally effective - algorithm based on Random Forest (RF) models. Specifically, we exploit RFs to rank variables according to their importance. Then, we use that information to perform a statistical analysis and determine the minimal set of features necessary to build an accurate FIS. We show the effectiveness of our approach by using two (semi)synthetic datasets built on real-world datasets, and we validate our approach by applying the FS method to a medical dataset.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				2024
			
	Tutti gli autori
	
						Tangherloni, Andrea; Cazzaniga, Paolo; Stranieri, N.; Buffa, F. M.; Nobile, M. S.
					
	Nelle collezioni:
	
				1.4.01 Contributi in atti di convegno - Conference presentations

File allegato/i alla scheda:

File	Dimensione del file	Formato
CIBCB 2024 Cazzaniga.pdf Solo gestori di archivio Versione: postprint - versione referata/accettata senza referaggio Licenza: Licenza default Aisberg Dimensione del file 989.28 kB Formato Adobe PDF Visualizza/Apri	989.28 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/297725

Citazioni

2

0

social impact