Cluster Analysis of Curved-Shaped Data with Species-Sampling Mixture Models

We are interested in clustering data whose support is “curved”. Recently we have ad- dressed this problem, introducing a model which combines two ingredients: species sampling mixtures of parametric densities on one hand, and a deterministic clustering procedure (DBSCAN) on the other. In short, under this model two observations share the same cluster if the distance between the densities corresponding to their latent parameters is smaller than a threshold. However, in this case, the prior cluster assignment is based on the geometry of the space of kernel densities rather than a direct random partition prior elicitation. Following the latter alternative, a new hierarchical model for clustering is proposed here, where the data in each cluster are parametrically distributed around a curve (principal curve), and the prior cluster assignment is given on the latent variables at the second level of hierarchy according to a species sampling model. These two mixture models are compared here with respect to cluster estimates obtained for a simulated bivariate dataset from two clusters, one being banana-shaped.

(2013). Cluster Analysis of Curved-Shaped Data with Species-Sampling Mixture Models . Retrieved from http://hdl.handle.net/10446/193986

Cluster Analysis of Curved-Shaped Data with Species-Sampling Mixture Models

Argiento, Raffaele;Cremaschi, Andrea;Guglielmi, Alessandra

2013-01-01

Abstract

We are interested in clustering data whose support is “curved”. Recently we have ad- dressed this problem, introducing a model which combines two ingredients: species sampling mixtures of parametric densities on one hand, and a deterministic clustering procedure (DBSCAN) on the other. In short, under this model two observations share the same cluster if the distance between the densities corresponding to their latent parameters is smaller than a threshold. However, in this case, the prior cluster assignment is based on the geometry of the space of kernel densities rather than a direct random partition prior elicitation. Following the latter alternative, a new hierarchical model for clustering is proposed here, where the data in each cluster are parametrically distributed around a curve (principal curve), and the prior cluster assignment is given on the latent variables at the second level of hierarchy according to a species sampling model. These two mixture models are compared here with respect to cluster estimates obtained for a simulated bivariate dataset from two clusters, one being banana-shaped.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				2013
			
	Tutti gli autori
	
						Argiento, Raffaele; Cremaschi, Andrea; Guglielmi, Alessandra
					
	Nelle collezioni:
	
				1.4.01 Contributi in atti di convegno - Conference presentations

File allegato/i alla scheda:

File	Dimensione del file	Formato
SCo2013Aegiento_et_al.pdf Solo gestori di archivio Versione: publisher's version - versione editoriale Licenza: Licenza default Aisberg Dimensione del file 542.1 kB Formato Adobe PDF Visualizza/Apri	542.1 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/193986

Citazioni

ND

ND

social impact