In this work a model-based approach for clustering categorical data with no natural ordering is introduced. The proposed method exploits the Hamming distance to define a family of probability mass functions to model categorical data. The elements of this family are considered as kernels of a finite mixture model with unknown number of components. Fully Bayesian inference is provided using a sampling strategy based on a trans-dimensional blocked Gibbs-sampler, facilitating computation with respect to the customary reversible-jump algorithm. Model performances are assessed via a simulation study, showing improvements both in terms of prediction and estimation, with respect to existing approaches. Finally, our method is illustrated with application to reference datasets.

(2021). Model-based clustering clustering for categorical data via Hamming distance . Retrieved from http://hdl.handle.net/10446/194004

Model-based clustering clustering for categorical data via Hamming distance

Argiento, Raffaele;
2021-01-01

Abstract

In this work a model-based approach for clustering categorical data with no natural ordering is introduced. The proposed method exploits the Hamming distance to define a family of probability mass functions to model categorical data. The elements of this family are considered as kernels of a finite mixture model with unknown number of components. Fully Bayesian inference is provided using a sampling strategy based on a trans-dimensional blocked Gibbs-sampler, facilitating computation with respect to the customary reversible-jump algorithm. Model performances are assessed via a simulation study, showing improvements both in terms of prediction and estimation, with respect to existing approaches. Finally, our method is illustrated with application to reference datasets.
2021
Inglese
CLADAG 2021: Book of abstracts and short papers, 3th Scientific Meeting of the Classification and Data Analysis Group - Firenze, September 9-11, 2021
Porzio, Giovanni Camillo; Rampichini, Carla; Bocci, Chiara;
978-88-5518-340-6
128
31
31
online
Italy
Firenze
FUP (Firenze University Press)
CLADAG 2021: 13th Scientific Meeting of the Classification and Data Analysis Group, online, Firenze, 9-11 September 2021
13th
Virtual conference (Firenze, Italy)
9-11 September 2021
SIS (Italian Statistical Society)
Settore SECS-S/01 - Statistica
Hamming distribution; mixture modelling; categorical data analysis; blocked Gibbs Sampling;
Argiento, Raffaele; Filippi-Mazzola, Edoardo; Paci, Lucia
open
3
1.4 Contributi in atti di convegno - Contributions in conference proceedings::1.4.02 Abstract in atti di convegno - Conference abstracts
Non definito
274
info:eu-repo/semantics/conferenceObject
(2021). Model-based clustering clustering for categorical data via Hamming distance . Retrieved from http://hdl.handle.net/10446/194004
File allegato/i alla scheda:
File Dimensione del file Formato  
Argiento3.pdf

accesso aperto

Versione: publisher's version - versione editoriale
Licenza: Creative commons
Dimensione del file 654.97 kB
Formato Adobe PDF
654.97 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/194004
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact