Hierarchical Normalized Completely Random Measures to Cluster Grouped Data

In this article, we propose a Bayesian nonparametric model for clustering grouped data. We adopt a hierarchical approach: at the highest level, each group of data is modeled according to a mixture, where the mixing distributions are conditionally independent normalized completely random measures (NormCRMs) centered on the same base measure, which is itself a NormCRM. The discreteness of the shared base measure implies that the processes at the data level share the same atoms. This desired feature allows to cluster together observations of different groups. We obtain a representation of the hierarchical clustering model by marginalizing with respect to the infinite dimensional NormCRMs. We investigate the properties of the clustering structure induced by the proposed model and provide theoretical results concerning the distribution of the number of clusters, within and between groups. Furthermore, we offer an interpretation in terms of generalized Chinese restaurant franchise process, which allows for posterior inference under both conjugate and nonconjugate models. We develop algorithms for fully Bayesian inference and assess performances by means of a simulation study and a real-data illustration. Supplementary materials for this article are available online.

(2020). Hierarchical Normalized Completely Random Measures to Cluster Grouped Data [journal article - articolo]. In JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION. Retrieved from http://hdl.handle.net/10446/193473

Hierarchical Normalized Completely Random Measures to Cluster Grouped Data

Argiento, Raffaele;Cremaschi, A.;Vannucci, M.

2020-01-01

Abstract

In this article, we propose a Bayesian nonparametric model for clustering grouped data. We adopt a hierarchical approach: at the highest level, each group of data is modeled according to a mixture, where the mixing distributions are conditionally independent normalized completely random measures (NormCRMs) centered on the same base measure, which is itself a NormCRM. The discreteness of the shared base measure implies that the processes at the data level share the same atoms. This desired feature allows to cluster together observations of different groups. We obtain a representation of the hierarchical clustering model by marginalizing with respect to the infinite dimensional NormCRMs. We investigate the properties of the clustering structure induced by the proposed model and provide theoretical results concerning the distribution of the number of clusters, within and between groups. Furthermore, we offer an interpretation in terms of generalized Chinese restaurant franchise process, which allows for posterior inference under both conjugate and nonconjugate models. We develop algorithms for fully Bayesian inference and assess performances by means of a simulation study and a real-data illustration. Supplementary materials for this article are available online.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di articolo
	
				articolo
			
	Data di pubblicazione
	
				2020
			
	Rivista in ANCE
	
				JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
			
	Tutti gli autori
	
						Argiento, Raffaele; Cremaschi, A.; Vannucci, M.
					
	Citazione
	
				(2020). Hierarchical Normalized Completely Random Measures to Cluster Grouped Data  [journal article - articolo]. In JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION. Retrieved from http://hdl.handle.net/10446/193473
			
	Nelle collezioni:
	
				1.1.01 Articoli/Saggi in rivista - Journal Articles/Essays

File allegato/i alla scheda:

File	Dimensione del file	Formato
Hierarchical Normalized Completely Random Measures to Cluster Grouped Data.pdf Solo gestori di archivio Versione: publisher's version - versione editoriale Licenza: Licenza default Aisberg Dimensione del file 2.79 MB Formato Adobe PDF Visualizza/Apri	2.79 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/193473

Citazioni

15

17

social impact