Bayesian nonparametric clustering and association studies for candidate SNP observations

Clustering is often considered as the first step in the analysis when dealing with an enormous amount of Single Nucleotide Polymorphism (SNP) genotype data. The lack of biological information could affect the outcome of such procedure. Even if a clustering procedure has been selected and performed, the impact of its uncertainty on the subsequent association analysis is rarely assessed. In this research we propose first a model to cluster SNPs data, then we assess the association between the cluster and a disease. In particular, we adopt a Dirichlet process mixture model with the advantages, with respect to the usual clustering methods, that the number of clusters needs not to be known and fixed in advance and the variation in the assignment of SNPs to clusters can be accounted. In addition, once a clustering of SNPs is obtained, we design an individualized genetic score quantifying the SNP composition in each cluster for every subject, so that we can set up a generalized linear model for association analysis able to incorporate the information from a large-scale SNP dataset, and yet with a much smaller number of explanatory variables. The inference on cluster allocation, the strength of association of each cluster (the collective effect on SNPs in the same cluster), and the susceptibility of each SNP are based on posterior samples from Markov chain Monte Carlo methods and the Binder loss information. We exemplify this Bayesian nonparametric strategy in a genome-wide association study of Crohn’s disease in a case-control setting.

(2017). Bayesian nonparametric clustering and association studies for candidate SNP observations [journal article - articolo]. In INTERNATIONAL JOURNAL OF APPROXIMATE REASONING. Retrieved from http://hdl.handle.net/10446/193467

Bayesian nonparametric clustering and association studies for candidate SNP observations

Wang, C.;Ruggeri, F.;Hsiao, C. K.;Argiento, Raffaele

2017-01-01

Abstract

Clustering is often considered as the first step in the analysis when dealing with an enormous amount of Single Nucleotide Polymorphism (SNP) genotype data. The lack of biological information could affect the outcome of such procedure. Even if a clustering procedure has been selected and performed, the impact of its uncertainty on the subsequent association analysis is rarely assessed. In this research we propose first a model to cluster SNPs data, then we assess the association between the cluster and a disease. In particular, we adopt a Dirichlet process mixture model with the advantages, with respect to the usual clustering methods, that the number of clusters needs not to be known and fixed in advance and the variation in the assignment of SNPs to clusters can be accounted. In addition, once a clustering of SNPs is obtained, we design an individualized genetic score quantifying the SNP composition in each cluster for every subject, so that we can set up a generalized linear model for association analysis able to incorporate the information from a large-scale SNP dataset, and yet with a much smaller number of explanatory variables. The inference on cluster allocation, the strength of association of each cluster (the collective effect on SNPs in the same cluster), and the susceptibility of each SNP are based on posterior samples from Markov chain Monte Carlo methods and the Binder loss information. We exemplify this Bayesian nonparametric strategy in a genome-wide association study of Crohn’s disease in a case-control setting.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di articolo
	
				articolo
			
	Data di pubblicazione
	
				2017
			
	Rivista in ANCE
	
				INTERNATIONAL JOURNAL OF APPROXIMATE REASONING
			
	Tutti gli autori
	
						Wang, C.; Ruggeri, F.; Hsiao, C. K.; Argiento, Raffaele
					
	Citazione
	
				(2017). Bayesian nonparametric clustering and association studies for candidate SNP observations  [journal article - articolo]. In INTERNATIONAL JOURNAL OF APPROXIMATE REASONING. Retrieved from http://hdl.handle.net/10446/193467
			
	Nelle collezioni:
	
				1.1.01 Articoli/Saggi in rivista - Journal Articles/Essays

File allegato/i alla scheda:

File	Dimensione del file	Formato
revision_2_4aperto.pdf Open Access dal 01/02/2019 Versione: postprint - versione referata/accettata senza referaggio Licenza: Creative commons Dimensione del file 1.2 MB Formato Adobe PDF Visualizza/Apri	1.2 MB	Adobe PDF	Visualizza/Apri
1-s2.0-S0888613X16301190-main.pdf Solo gestori di archivio Versione: publisher's version - versione editoriale Licenza: Licenza default Aisberg Dimensione del file 902.51 kB Formato Adobe PDF Visualizza/Apri	902.51 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/193467

Citazioni

2

2

social impact