Inspired by scaffold filling, a recent approach for genome reconstruction from incomplete data, we consider a variant of the well-known longest common subsequence problem for the comparison of two sequences. The new problem, called Longest Filled Common Subsequence, aims to compare a complete sequence with an incomplete one, i.e. with some missing elements. Longest Filled Common Subsequence (LFCS), given a complete sequence A, an incomplete sequence B, and a multiset M of symbols missing in B, asks for a sequence B⁎ obtained by inserting the symbols of M into B so that B⁎ induces a common subsequence with A of maximum length. We investigate the computational and approximation complexity of the problem and we show that it is NP-hard and APX-hard when A contains at most two occurrences of each symbol, and we give a polynomial time algorithm when the input sequences are over a constant-size alphabet. We give a [Formula presented] approximation algorithm for the Longest Filled Common Subsequence problem. Finally, we present a fixed-parameter algorithm for the problem, when it is parameterized by the number of symbols inserted in B that “match” symbols of A.

(2019). Comparing incomplete sequences via longest common subsequence [journal article - articolo]. In THEORETICAL COMPUTER SCIENCE. Retrieved from http://hdl.handle.net/10446/150301

Comparing incomplete sequences via longest common subsequence

Dondi, Riccardo;
2019

Abstract

Inspired by scaffold filling, a recent approach for genome reconstruction from incomplete data, we consider a variant of the well-known longest common subsequence problem for the comparison of two sequences. The new problem, called Longest Filled Common Subsequence, aims to compare a complete sequence with an incomplete one, i.e. with some missing elements. Longest Filled Common Subsequence (LFCS), given a complete sequence A, an incomplete sequence B, and a multiset M of symbols missing in B, asks for a sequence B⁎ obtained by inserting the symbols of M into B so that B⁎ induces a common subsequence with A of maximum length. We investigate the computational and approximation complexity of the problem and we show that it is NP-hard and APX-hard when A contains at most two occurrences of each symbol, and we give a polynomial time algorithm when the input sequences are over a constant-size alphabet. We give a [Formula presented] approximation algorithm for the Longest Filled Common Subsequence problem. Finally, we present a fixed-parameter algorithm for the problem, when it is parameterized by the number of symbols inserted in B that “match” symbols of A.
articolo
Castelli, Mauro; Dondi, Riccardo; Mauri, Giancarlo; Zoppis, Italo
(2019). Comparing incomplete sequences via longest common subsequence [journal article - articolo]. In THEORETICAL COMPUTER SCIENCE. Retrieved from http://hdl.handle.net/10446/150301
File allegato/i alla scheda:
File Dimensione del file Formato  
TCS2019IncompleteSequence.pdf

Solo gestori di archivio

Versione: publisher's version - versione editoriale
Licenza: Licenza default Aisberg
Dimensione del file 578.65 kB
Formato Adobe PDF
578.65 kB Adobe PDF   Visualizza/Apri
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/150301
Citazioni
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 1
social impact