Comparing incomplete sequences via longest common subsequence

Inspired by scaffold filling, a recent approach for genome reconstruction from incomplete data, we consider a variant of the well-known longest common subsequence problem for the comparison of two sequences. The new problem, called Longest Filled Common Subsequence, aims to compare a complete sequence with an incomplete one, i.e. with some missing elements. Longest Filled Common Subsequence (LFCS), given a complete sequence A, an incomplete sequence B, and a multiset M of symbols missing in B, asks for a sequence B⁎ obtained by inserting the symbols of M into B so that B⁎ induces a common subsequence with A of maximum length. We investigate the computational and approximation complexity of the problem and we show that it is NP-hard and APX-hard when A contains at most two occurrences of each symbol, and we give a polynomial time algorithm when the input sequences are over a constant-size alphabet. We give a [Formula presented] approximation algorithm for the Longest Filled Common Subsequence problem. Finally, we present a fixed-parameter algorithm for the problem, when it is parameterized by the number of symbols inserted in B that “match” symbols of A.

(2019). Comparing incomplete sequences via longest common subsequence [journal article - articolo]. In THEORETICAL COMPUTER SCIENCE. Retrieved from http://hdl.handle.net/10446/150301

Comparing incomplete sequences via longest common subsequence

Castelli, Mauro;Dondi, Riccardo;Mauri, Giancarlo;Zoppis, Italo

2019-01-01

Abstract

Inspired by scaffold filling, a recent approach for genome reconstruction from incomplete data, we consider a variant of the well-known longest common subsequence problem for the comparison of two sequences. The new problem, called Longest Filled Common Subsequence, aims to compare a complete sequence with an incomplete one, i.e. with some missing elements. Longest Filled Common Subsequence (LFCS), given a complete sequence A, an incomplete sequence B, and a multiset M of symbols missing in B, asks for a sequence B⁎ obtained by inserting the symbols of M into B so that B⁎ induces a common subsequence with A of maximum length. We investigate the computational and approximation complexity of the problem and we show that it is NP-hard and APX-hard when A contains at most two occurrences of each symbol, and we give a polynomial time algorithm when the input sequences are over a constant-size alphabet. We give a [Formula presented] approximation algorithm for the Longest Filled Common Subsequence problem. Finally, we present a fixed-parameter algorithm for the problem, when it is parameterized by the number of symbols inserted in B that “match” symbols of A.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di articolo
	
				articolo
			
	Data di pubblicazione
	
				2019
			
	Rivista in ANCE
	
				THEORETICAL COMPUTER SCIENCE
			
	Tutti gli autori
	
						Castelli, Mauro; Dondi, Riccardo; Mauri, Giancarlo; Zoppis, Italo
					
	Citazione
	
				(2019). Comparing incomplete sequences via longest common subsequence  [journal article - articolo]. In THEORETICAL COMPUTER SCIENCE. Retrieved from http://hdl.handle.net/10446/150301
			
	Nelle collezioni:
	
				1.1.01 Articoli/Saggi in rivista - Journal Articles/Essays

File allegato/i alla scheda:

File	Dimensione del file	Formato
TCS2019IncompleteSequence.pdf Solo gestori di archivio Versione: publisher's version - versione editoriale Licenza: Licenza default Aisberg Dimensione del file 578.65 kB Formato Adobe PDF Visualizza/Apri	578.65 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/150301

Citazioni

6

4

social impact