The novel context of Big Data has demonstrated that classical relational databases are not suitable: novel platforms for managing an incredible variety of datasets have become necessary, as demonstrated by the popularity of “data lakes” and “data lakehouses”. One common issue of modern data platforms is to detect pairs of datasets that concern the same topic. However, a matching that is purely syntactic is not effective: the exploitation of modern AI techniques for Natural-Language Processing, such as word embedding and sentence embedding, promise to address the issue in a (more or less) semantic way. The contribution of the paper is a novel methodology (called “TopicRank”) for flexible querying data platforms, so as to find out pairs of datasets that concern the same topic, on the basis of the textual description that accompany datasets as meta-data. The paper presents the results of a preliminary experiment that was conducted on a real pool of datasets.
(2025). Detecting Semantic Relationships Among Datasets . Retrieved from https://hdl.handle.net/10446/310987
Detecting Semantic Relationships Among Datasets
Fosci, Paolo;Psaila, Giuseppe;
2025-01-01
Abstract
The novel context of Big Data has demonstrated that classical relational databases are not suitable: novel platforms for managing an incredible variety of datasets have become necessary, as demonstrated by the popularity of “data lakes” and “data lakehouses”. One common issue of modern data platforms is to detect pairs of datasets that concern the same topic. However, a matching that is purely syntactic is not effective: the exploitation of modern AI techniques for Natural-Language Processing, such as word embedding and sentence embedding, promise to address the issue in a (more or less) semantic way. The contribution of the paper is a novel methodology (called “TopicRank”) for flexible querying data platforms, so as to find out pairs of datasets that concern the same topic, on the basis of the textual description that accompany datasets as meta-data. The paper presents the results of a preliminary experiment that was conducted on a real pool of datasets.| File | Dimensione del file | Formato | |
|---|---|---|---|
|
Detecting Semantic Relationships Fosci Paolo_ridotto.pdf
Solo gestori di archivio
Versione:
publisher's version - versione editoriale
Licenza:
Licenza default Aisberg
Dimensione del file
239.04 kB
Formato
Adobe PDF
|
239.04 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

