Better than the best? Answers via model ensemble in density-based clustering

With the recent growth in data availability and complexity, and the associated outburst of elaborate modelling approaches, model selection tools have become a lifeline, providing objective criteria to deal with this increasingly challenging landscape. In fact, basing predictions and inference on a single model may be limiting if not harmful; ensemble approaches, which combine different models, have been proposed to overcome the selection step, and proven fruitful especially in the supervised learning framework. Conversely, these approaches have been scantily explored in the unsupervised setting. In this work we focus on the model-based clustering formulation, where a plethora of mixture models, with different number of components and parametrizations, is typically estimated. We propose an ensemble clustering approach that circumvents the single best model paradigm, while improving stability and robustness of the partitions. A new density estimator, being a convex linear combination of the density estimates in the ensemble, is introduced and exploited for group assignment. As opposed to the standard case, where clusters are typically associated to the components of the selected mixture model, we define partitions by borrowing the modal, or nonparametric, formulation of the clustering problem, where groups are linked with high-density regions. Staying in the density-based realm we thus show how blending together parametric and nonparametric approaches may be beneficial from a clustering perspective.

(2021). Better than the best? Answers via model ensemble in density-based clustering [journal article - articolo]. In ADVANCES IN DATA ANALYSIS AND CLASSIFICATION. Retrieved from https://hdl.handle.net/10446/269557

Better than the best? Answers via model ensemble in density-based clustering

Casa, Alessandro;Scrucca, Luca;Menardi, Giovanna

2021-01-01

Abstract

With the recent growth in data availability and complexity, and the associated outburst of elaborate modelling approaches, model selection tools have become a lifeline, providing objective criteria to deal with this increasingly challenging landscape. In fact, basing predictions and inference on a single model may be limiting if not harmful; ensemble approaches, which combine different models, have been proposed to overcome the selection step, and proven fruitful especially in the supervised learning framework. Conversely, these approaches have been scantily explored in the unsupervised setting. In this work we focus on the model-based clustering formulation, where a plethora of mixture models, with different number of components and parametrizations, is typically estimated. We propose an ensemble clustering approach that circumvents the single best model paradigm, while improving stability and robustness of the partitions. A new density estimator, being a convex linear combination of the density estimates in the ensemble, is introduced and exploited for group assignment. As opposed to the standard case, where clusters are typically associated to the components of the selected mixture model, we define partitions by borrowing the modal, or nonparametric, formulation of the clustering problem, where groups are linked with high-density regions. Staying in the density-based realm we thus show how blending together parametric and nonparametric approaches may be beneficial from a clustering perspective.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di articolo
	
				articolo
			
	Data di pubblicazione
	
				2021
			
	Rivista in ANCE
	
				ADVANCES IN DATA ANALYSIS AND CLASSIFICATION
			
	Tutti gli autori
	
						Casa, Alessandro; Scrucca, Luca; Menardi, Giovanna
					
	Citazione
	
				(2021). Better than the best? Answers via model ensemble in density-based clustering  [journal article - articolo]. In ADVANCES IN DATA ANALYSIS AND CLASSIFICATION. Retrieved from https://hdl.handle.net/10446/269557
			
	Nelle collezioni:
	
				1.1.01 Articoli/Saggi in rivista - Journal Articles/Essays

File allegato/i alla scheda:

File	Dimensione del file	Formato
s11634-020-00423-6.pdf accesso aperto Versione: publisher's version - versione editoriale Licenza: Creative commons Dimensione del file 806.34 kB Formato Adobe PDF Visualizza/Apri	806.34 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/269557

Citazioni

6

4

social impact