Building a Query Engine for a Corpus of Open Data

Public Administrations openly publish many data sets concerning citizens and territories in order to increase the amount of information made available for people, firms and public administrators. As an effect, Open Data corpora has become so huge that it is impossible to deal with them by hand; as a consequence, it is necessary to use tools that include innovative techniques able to query them. In this paper, we present a technique to select open data sets containing specific pieces of information, and retrieve them in a corpus published by a portal of open data. In particular, users can formulate structured queries blindly submitted to our search engine prototype (i.e., being unaware of the actual structure of data sets). Our approach reinterpret and mixes several known information retrieval approaches, giving at the same time a database view of the problem. We implemented this technique within a prototype, that we tested on a corpus containing more that over 2000 data sets. We noted that our technique provides focused results w.r.t. the baseline experiments performed with Apache Solr.

(2017). Building a Query Engine for a Corpus of Open Data . Retrieved from http://hdl.handle.net/10446/94166

Building a Query Engine for a Corpus of Open Data

Pelucchi, Mauro;Psaila, Giuseppe;Toccu, Maurizio Pietro

2017-01-01

Abstract

Public Administrations openly publish many data sets concerning citizens and territories in order to increase the amount of information made available for people, firms and public administrators. As an effect, Open Data corpora has become so huge that it is impossible to deal with them by hand; as a consequence, it is necessary to use tools that include innovative techniques able to query them. In this paper, we present a technique to select open data sets containing specific pieces of information, and retrieve them in a corpus published by a portal of open data. In particular, users can formulate structured queries blindly submitted to our search engine prototype (i.e., being unaware of the actual structure of data sets). Our approach reinterpret and mixes several known information retrieval approaches, giving at the same time a database view of the problem. We implemented this technique within a prototype, that we tested on a corpus containing more that over 2000 data sets. We noted that our technique provides focused results w.r.t. the baseline experiments performed with Apache Solr.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				2017
			
	Tutti gli autori
	
						Pelucchi, Mauro; Psaila, Giuseppe; Toccu, Maurizio Pietro
					
	Nelle collezioni:
	
				1.4.01 Contributi in atti di convegno - Conference presentations

File allegato/i alla scheda:

File	Dimensione del file	Formato
WEBIST_2017_46.pdf Solo gestori di archivio Versione: publisher's version - versione editoriale Licenza: Licenza default Aisberg Dimensione del file 262.4 kB Formato Adobe PDF Visualizza/Apri	262.4 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/94166

Citazioni

8

6

social impact