The novel context of Big Data has demonstrated that classical relational databases are not suitable: novel platforms for managing an incredible variety of datasets have become necessary, as demonstrated by the popularity of “data lakes” and “data lakehouses”. One common issue of modern data platforms is to detect pairs of datasets that concern the same topic. However, a matching that is purely syntactic is not effective: the exploitation of modern AI techniques for Natural-Language Processing, such as word embedding and sentence embedding, promise to address the issue in a (more or less) semantic way. The contribution of the paper is a novel methodology (called “TopicRank”) for flexible querying data platforms, so as to find out pairs of datasets that concern the same topic, on the basis of the textual description that accompany datasets as meta-data. The paper presents the results of a preliminary experiment that was conducted on a real pool of datasets.

(2025). Detecting Semantic Relationships Among Datasets . Retrieved from https://hdl.handle.net/10446/310987

Detecting Semantic Relationships Among Datasets

Fosci, Paolo;Psaila, Giuseppe;
2025-01-01

Abstract

The novel context of Big Data has demonstrated that classical relational databases are not suitable: novel platforms for managing an incredible variety of datasets have become necessary, as demonstrated by the popularity of “data lakes” and “data lakehouses”. One common issue of modern data platforms is to detect pairs of datasets that concern the same topic. However, a matching that is purely syntactic is not effective: the exploitation of modern AI techniques for Natural-Language Processing, such as word embedding and sentence embedding, promise to address the issue in a (more or less) semantic way. The contribution of the paper is a novel methodology (called “TopicRank”) for flexible querying data platforms, so as to find out pairs of datasets that concern the same topic, on the basis of the textual description that accompany datasets as meta-data. The paper presents the results of a preliminary experiment that was conducted on a real pool of datasets.
2025
Fosci, Paolo; Carbone, Vincenzo; Leo, Matteo; Marmorato, Andrea; Psaila, Giuseppe; Rosa, Giampiero; Torabi, Mohammadsadegh
File allegato/i alla scheda:
File Dimensione del file Formato  
Detecting Semantic Relationships Fosci Paolo_ridotto.pdf

Solo gestori di archivio

Versione: publisher's version - versione editoriale
Licenza: Licenza default Aisberg
Dimensione del file 239.04 kB
Formato Adobe PDF
239.04 kB Adobe PDF   Visualizza/Apri
Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/310987
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact