High Performance Computing for Haplotyping: Models and Platforms

The reconstruction of the haplotype pair for each chromosome is a hot topic in Bioinformatics and Genome Analysis. In Haplotype Assembly (HA), all heterozygous Single Nucleotide Polymorphisms (SNPs) have to be assigned to exactly one of the two chromosomes. In this work, we outline the state-of-the-art on HA approaches and present an in-depth analysis of the computational performance of GenHap, a recent method based on Genetic Algorithms. GenHap was designed to tackle the computational complexity of the HA problem by means of a divide-et-impera strategy that effectively leverages multi-core architectures. In order to evaluate GenHap’s performance, we generated different instances of synthetic (yet realistic) data exploiting empirical error models of four different sequencing platforms (namely, Illumina NovaSeq, Roche/454, PacBio RS II and Oxford Nanopore Technologies MinION). Our results show that the processing time generally decreases along with the read length, involving a lower number of sub-problems to be distributed on multiple cores.

(2019). High Performance Computing for Haplotyping: Models and Platforms . Retrieved from http://hdl.handle.net/10446/136040

High Performance Computing for Haplotyping: Models and Platforms

Tangherloni, Andrea;Rundo, Leonardo;Spolaor, Simone;Nobile, Marco S.;Merelli, Ivan;Besozzi, Daniela;Mauri, Giancarlo;Cazzaniga, Paolo;Liò, Pietro

2019-01-01

Abstract

The reconstruction of the haplotype pair for each chromosome is a hot topic in Bioinformatics and Genome Analysis. In Haplotype Assembly (HA), all heterozygous Single Nucleotide Polymorphisms (SNPs) have to be assigned to exactly one of the two chromosomes. In this work, we outline the state-of-the-art on HA approaches and present an in-depth analysis of the computational performance of GenHap, a recent method based on Genetic Algorithms. GenHap was designed to tackle the computational complexity of the HA problem by means of a divide-et-impera strategy that effectively leverages multi-core architectures. In order to evaluate GenHap’s performance, we generated different instances of synthetic (yet realistic) data exploiting empirical error models of four different sequencing platforms (namely, Illumina NovaSeq, Roche/454, PacBio RS II and Oxford Nanopore Technologies MinION). Our results show that the processing time generally decreases along with the read length, involving a lower number of sub-problems to be distributed on multiple cores.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				2019
			
	Tutti gli autori
	
						Tangherloni, Andrea; Rundo, Leonardo; Spolaor, Simone; Nobile, Marco S.; Merelli, Ivan; Besozzi, Daniela; Mauri, Giancarlo; Cazzaniga, Paolo; Liò, Pie...espandi
						
	Nelle collezioni:
	
				1.4.01 Contributi in atti di convegno - Conference presentations

File allegato/i alla scheda:

File	Dimensione del file	Formato
Tangherloni2019_Chapter_HighPerformanceComputingForHap.pdf Solo gestori di archivio Versione: publisher's version - versione editoriale Licenza: Licenza default Aisberg Dimensione del file 842.26 kB Formato Adobe PDF Visualizza/Apri	842.26 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/136040

Citazioni

5

1

social impact