We propose a new model for cluster analysis in a Bayesian nonparametric framework.Our model combines two ingredients, species sampling mixture models of Gaussian distributions on one hand, and a deterministic clustering procedure (DBSCAN) on the other. Here, two observations from the underlying species sampling mixture model share the same cluster if the distance between the densities corresponding to their latent parameters is smaller than a threshold; this yields a random partition which is coarser than the one induced by the species sampling mixture. Since this procedure depends on the value of the threshold, we suggest a strategy to fix it. In addition, we discuss implementation and applications of the model; comparison with more standard clustering algorithms will be given as well. Supplementary materials for the article are available online.
(2014). A “Density-Based” Algorithm for Cluster Analysis Using Species Sampling Gaussian Mixture Models [journal article - articolo]. In JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS. Retrieved from http://hdl.handle.net/10446/193124
A “Density-Based” Algorithm for Cluster Analysis Using Species Sampling Gaussian Mixture Models
Argiento, Raffaele;
2014-01-01
Abstract
We propose a new model for cluster analysis in a Bayesian nonparametric framework.Our model combines two ingredients, species sampling mixture models of Gaussian distributions on one hand, and a deterministic clustering procedure (DBSCAN) on the other. Here, two observations from the underlying species sampling mixture model share the same cluster if the distance between the densities corresponding to their latent parameters is smaller than a threshold; this yields a random partition which is coarser than the one induced by the species sampling mixture. Since this procedure depends on the value of the threshold, we suggest a strategy to fix it. In addition, we discuss implementation and applications of the model; comparison with more standard clustering algorithms will be given as well. Supplementary materials for the article are available online.File | Dimensione del file | Formato | |
---|---|---|---|
12-3_JCGS_4aperto.pdf
Open Access dal 22/10/2015
Descrizione: "This is an Accepted Manuscript version of the following article, accepted for publication in Journal of Computational and Graphical Statistics. "Raffaele Argiento, Andrea Cremaschi, Marina Vannucci. (2020) Hierarchical Normalized Completely Random Measures to Cluster Grouped Data. Journal of the American Statistical Association 115:529, pages". It is deposited under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way.”
Versione:
postprint - versione referata/accettata senza referaggio
Licenza:
Creative commons
Dimensione del file
1.41 MB
Formato
Adobe PDF
|
1.41 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo