Relevance & Research Question: The use of social media as promising data source has become increasingly important in recent years. Social media data, such as tweets, do not only pave the way for new research possibilities, but also raise completely new methodological and substantial questions in a lot of research field (e.g., social sciences, statistics and so forth). This work aims at finding an efficient and optimized way of data collection. In particular, we compare different data collection strategies in collecting Twitter data (for example in order to enhance the territorial coverage of different geographical areas). Methods & Data: For this purpose, we collected Twitter data among the whole United Kingdom for a period of 90 days, implementing three different parallel tweet collection strategies, set as follows: 1) the boarders of the 12 UK territorial regions (NUTS) were precisely mapped by means of a large number of medium-sized sub-areas (whereas big cities were covered by many smaller sub-areas); 2) the same borders were mapped as precisely as possible, adapting, at the same time, the size of the sub-areas to the actual population density. 3) A high amount of small and equally sized sub-areas was used in order to map NUTS, without considering the population density. In total, we collected more than 300 million tweets, out of which 1% includes geographical metadata (useful to check the accuracy of data collection’s geo-coordinates). Results: The analysis of tweets including geographical metadata reveal that these tweets were actually posted in the expected regions. This leads to the conclusion that the same probably happens for tweets without geographical metadata. Moreover, the strategy of population density-adapted sub-areas has proven to cover the posted tweets in the most accurate way. Added Value: Our findings indicate that, using our second collection strategy, tweets can be correctly assigned to territorial regions, such as cities or country units. Furthermore, we were able to identify an efficient and exhaustive strategy for collecting Twitter data that balances the territorial coverage and the need of dealing with a reasonably sized dataset.

(2019). Optimized Strategies for Enhancing the Territorial Coverage in Twitter Data Collection . Retrieved from http://hdl.handle.net/10446/151749

Optimized Strategies for Enhancing the Territorial Coverage in Twitter Data Collection

Cameletti, Michela;Toninelli, Daniele
2019-01-01

Abstract

Relevance & Research Question: The use of social media as promising data source has become increasingly important in recent years. Social media data, such as tweets, do not only pave the way for new research possibilities, but also raise completely new methodological and substantial questions in a lot of research field (e.g., social sciences, statistics and so forth). This work aims at finding an efficient and optimized way of data collection. In particular, we compare different data collection strategies in collecting Twitter data (for example in order to enhance the territorial coverage of different geographical areas). Methods & Data: For this purpose, we collected Twitter data among the whole United Kingdom for a period of 90 days, implementing three different parallel tweet collection strategies, set as follows: 1) the boarders of the 12 UK territorial regions (NUTS) were precisely mapped by means of a large number of medium-sized sub-areas (whereas big cities were covered by many smaller sub-areas); 2) the same borders were mapped as precisely as possible, adapting, at the same time, the size of the sub-areas to the actual population density. 3) A high amount of small and equally sized sub-areas was used in order to map NUTS, without considering the population density. In total, we collected more than 300 million tweets, out of which 1% includes geographical metadata (useful to check the accuracy of data collection’s geo-coordinates). Results: The analysis of tweets including geographical metadata reveal that these tweets were actually posted in the expected regions. This leads to the conclusion that the same probably happens for tweets without geographical metadata. Moreover, the strategy of population density-adapted sub-areas has proven to cover the posted tweets in the most accurate way. Added Value: Our findings indicate that, using our second collection strategy, tweets can be correctly assigned to territorial regions, such as cities or country units. Furthermore, we were able to identify an efficient and exhaustive strategy for collecting Twitter data that balances the territorial coverage and the need of dealing with a reasonably sized dataset.
2019
Schlosser, Stephan; Cameletti, Michela; Toninelli, Daniele
File allegato/i alla scheda:
File Dimensione del file Formato  
GOR19_Proceedings_SS-MC-DT.pdf

Solo gestori di archivio

Descrizione: Abstract proceedings
Versione: publisher's version - versione editoriale
Licenza: Licenza default Aisberg
Dimensione del file 3.77 MB
Formato Adobe PDF
3.77 MB Adobe PDF   Visualizza/Apri
Schlosser-Optimized_Strategies_for_Enhancing_the_Territorial_Coverage-188.pdf

accesso aperto

Descrizione: Slide presentazione
Versione: publisher's version - versione editoriale
Licenza: Creative commons
Dimensione del file 1.34 MB
Formato Adobe PDF
1.34 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/151749
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact