Objective: Computing patients' similarity is of great interest in precision oncology since it supports clustering and subgroup identification, eventually leading to tailored therapies. The availability of large amounts of biomedical data, characterized by large feature sets and sparse content, motivates the development of new methods to compute patient similarities able to fuse heterogeneous data sources with the available knowledge. Materials and Methods: In this work, we developed a data integration approach based on matrix trifactorization to compute patient similarities by integrating several sources of data and knowledge. We assess the accuracy of the proposed method: (1) on several synthetic data sets which similarity structures are affected by increasing levels of noise and data sparsity, and (2) on a real data set coming from an acute myeloid leukemia (AML) study. The results obtained are finally compared with the ones of traditional similarity calculation methods. Results: In the analysis of the synthetic data set, where the ground truth is known, we measured the capability of reconstructing the correct clusters, while in the AML study we evaluated the Kaplan-Meier curves obtained with the different clusters and measured their statistical difference by means of the log-rank test. In presence of noise and sparse data, our data integration method outperform other techniques, both in the synthetic and in the AML data. Discussion: In case of multiple heterogeneous data sources, a matrix trifactorization technique can successfully fuse all the information in a joint model. We demonstrated how this approach can be efficiently applied to discover meaningful patient similarities and therefore may be considered a reliable data driven strategy for the definition of new research hypothesis for precision oncology. Conclusion: The better performance of the proposed approach presents an advantage over previous methods to provide accurate patient similarities supporting precision medicine.

(2018). Patient similarity by joint matrix trifactorization to identify subgroups in acute myeloid leukemia [journal article - articolo]. In JAMIA OPEN. Retrieved from https://hdl.handle.net/10446/316950

Patient similarity by joint matrix trifactorization to identify subgroups in acute myeloid leukemia

Pala D.;
2018-01-01

Abstract

Objective: Computing patients' similarity is of great interest in precision oncology since it supports clustering and subgroup identification, eventually leading to tailored therapies. The availability of large amounts of biomedical data, characterized by large feature sets and sparse content, motivates the development of new methods to compute patient similarities able to fuse heterogeneous data sources with the available knowledge. Materials and Methods: In this work, we developed a data integration approach based on matrix trifactorization to compute patient similarities by integrating several sources of data and knowledge. We assess the accuracy of the proposed method: (1) on several synthetic data sets which similarity structures are affected by increasing levels of noise and data sparsity, and (2) on a real data set coming from an acute myeloid leukemia (AML) study. The results obtained are finally compared with the ones of traditional similarity calculation methods. Results: In the analysis of the synthetic data set, where the ground truth is known, we measured the capability of reconstructing the correct clusters, while in the AML study we evaluated the Kaplan-Meier curves obtained with the different clusters and measured their statistical difference by means of the log-rank test. In presence of noise and sparse data, our data integration method outperform other techniques, both in the synthetic and in the AML data. Discussion: In case of multiple heterogeneous data sources, a matrix trifactorization technique can successfully fuse all the information in a joint model. We demonstrated how this approach can be efficiently applied to discover meaningful patient similarities and therefore may be considered a reliable data driven strategy for the definition of new research hypothesis for precision oncology. Conclusion: The better performance of the proposed approach presents an advantage over previous methods to provide accurate patient similarities supporting precision medicine.
articolo
2018
Vitali, F.; Marini, S.; Pala, Daniele; Demartini, A.; Montoli, S.; Zambelli, A.; Bellazzi, R.
(2018). Patient similarity by joint matrix trifactorization to identify subgroups in acute myeloid leukemia [journal article - articolo]. In JAMIA OPEN. Retrieved from https://hdl.handle.net/10446/316950
File allegato/i alla scheda:
File Dimensione del file Formato  
ooy008 (2).pdf

accesso aperto

Versione: publisher's version - versione editoriale
Licenza: Creative commons
Dimensione del file 734.09 kB
Formato Adobe PDF
734.09 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/316950
Citazioni
  • Scopus 19
  • ???jsp.display-item.citation.isi??? 18
social impact