A multiply-labeled tree (or MUL-tree) is a rooted tree in which every leaf is labeled by an element from some set X, but in which more than one leaf may be labeled by the same element of X. MUL-trees have applications in many fields. In phylogenetics, they can represent the evolution of gene families, where genes are represented by the species they belong to, the non-uniqueness of leaf-labels coming from the fact that a given genome may contain many paralogous genes. In this paper, we consider two problems related to the leaf-pruning (leaf removal) of MUL-trees leading to single-labeled trees. First, given a set of MUL-trees, the MUL-tree Set Pruning for Consistency (MULSETPC) Problem asks for a pruning of each tree leading to a set of consistent trees, i.e. a collection of label-isomorphic single-labeled trees. Second, processing each gene tree at a time, the MUL-tree Pruning for Reconciliation (MULPR) Problem asks for a pruning minimizing a reconciliation cost with a given species tree. We show that MULTSETPC is NP-hard and that MULPR is W[2]-hard when parameterized by the duplication cost. We then develop a polynomial-time heuristic for MULPR and show its accuracy by comparing it to a brute-force exact method on a set of gene trees from the Ensembl Genome Browser.

(2021). Complexity and Algorithms for MUL-Tree Pruning . Retrieved from http://hdl.handle.net/10446/200854

Complexity and Algorithms for MUL-Tree Pruning

Dondi, Riccardo;
2021-01-01

Abstract

A multiply-labeled tree (or MUL-tree) is a rooted tree in which every leaf is labeled by an element from some set X, but in which more than one leaf may be labeled by the same element of X. MUL-trees have applications in many fields. In phylogenetics, they can represent the evolution of gene families, where genes are represented by the species they belong to, the non-uniqueness of leaf-labels coming from the fact that a given genome may contain many paralogous genes. In this paper, we consider two problems related to the leaf-pruning (leaf removal) of MUL-trees leading to single-labeled trees. First, given a set of MUL-trees, the MUL-tree Set Pruning for Consistency (MULSETPC) Problem asks for a pruning of each tree leading to a set of consistent trees, i.e. a collection of label-isomorphic single-labeled trees. Second, processing each gene tree at a time, the MUL-tree Pruning for Reconciliation (MULPR) Problem asks for a pruning minimizing a reconciliation cost with a given species tree. We show that MULTSETPC is NP-hard and that MULPR is W[2]-hard when parameterized by the duplication cost. We then develop a polynomial-time heuristic for MULPR and show its accuracy by comparing it to a brute-force exact method on a set of gene trees from the Ensembl Genome Browser.
2021
Gascon, Mathieu; Dondi, Riccardo; El-Mabrouk, Nadia
File allegato/i alla scheda:
File Dimensione del file Formato  
IWOCA2021.pdf

Solo gestori di archivio

Versione: publisher's version - versione editoriale
Licenza: Licenza default Aisberg
Dimensione del file 783.08 kB
Formato Adobe PDF
783.08 kB Adobe PDF   Visualizza/Apri
Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/200854
Citazioni
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact