In the era of Big Data, the Internet has become one of the main data sources: Data can be collected for relatively low costs and can be used for a wide range of purposes. To be able to timely support solid decisions in any field, it is essential to increase data production efficiency, data accuracy, and reliability. In this framework, our paper aims at identifying an optimized and flexible method to collect and, at the same time, geolocate social media information over a whole country. In particular, the target of this paper is to compare three alternative methods to collect data from the social media Twitter. This is achieved considering four main comparison criteria: Collection time, dataset size, pre-processing phase load, and geographic distribution. Our findings regarding Great Britain identify one of these methods as the best option, since it is able to collect both the highest number of tweets per hour and the highest percentage of unique tweets per hour. Furthermore, this method reduces the computational effort needed to pre-process the collected tweets (e.g., showing the lowest collection times and the lowest number of duplicates within the geographical areas) and enhances the territorial coverage (if compared to the population distribution). At the same time, the effort required to set up this method is feasible and less prone to the arbitrary decisions of the researcher.

(2021). Comparing Methods to Collect and Geolocate Tweets in Great Britain [journal article - articolo]. In JOURNAL OF OPEN INNOVATION. Retrieved from http://hdl.handle.net/10446/173106

Comparing Methods to Collect and Geolocate Tweets in Great Britain

Toninelli, Daniele;Cameletti, Michela
2021-01-01

Abstract

In the era of Big Data, the Internet has become one of the main data sources: Data can be collected for relatively low costs and can be used for a wide range of purposes. To be able to timely support solid decisions in any field, it is essential to increase data production efficiency, data accuracy, and reliability. In this framework, our paper aims at identifying an optimized and flexible method to collect and, at the same time, geolocate social media information over a whole country. In particular, the target of this paper is to compare three alternative methods to collect data from the social media Twitter. This is achieved considering four main comparison criteria: Collection time, dataset size, pre-processing phase load, and geographic distribution. Our findings regarding Great Britain identify one of these methods as the best option, since it is able to collect both the highest number of tweets per hour and the highest percentage of unique tweets per hour. Furthermore, this method reduces the computational effort needed to pre-process the collected tweets (e.g., showing the lowest collection times and the lowest number of duplicates within the geographical areas) and enhances the territorial coverage (if compared to the population distribution). At the same time, the effort required to set up this method is feasible and less prone to the arbitrary decisions of the researcher.
articolo
2021
Schlosser, Stephan; Toninelli, Daniele; Cameletti, Michela
(2021). Comparing Methods to Collect and Geolocate Tweets in Great Britain [journal article - articolo]. In JOURNAL OF OPEN INNOVATION. Retrieved from http://hdl.handle.net/10446/173106
File allegato/i alla scheda:
File Dimensione del file Formato  
Comparing Methods to Collect and Geolocate Tweets in Great Britain.pdf

accesso aperto

Versione: publisher's version - versione editoriale
Licenza: Creative commons
Dimensione del file 2.66 MB
Formato Adobe PDF
2.66 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/173106
Citazioni
  • Scopus 16
  • ???jsp.display-item.citation.isi??? ND
social impact