Relevance & Research Question: The use of social media as promising data source has become increasingly important in recent years. Social media data, such as tweets, do not only pave the way for new research possibilities, but also raise completely new methodological and substantial questions in a lot of research field (e.g., social sciences, statistics and so forth). This work aims at finding an efficient and optimized way of data collection. In particular, we compare different data collection strategies in collecting Twitter data (for example in order to enhance the territorial coverage of different geographical areas). Methods & Data: For this purpose, we collected Twitter data among the whole United Kingdom for a period of 90 days, implementing three different parallel tweet collection strategies, set as follows: 1) the boarders of the 12 UK territorial regions (NUTS) were precisely mapped by means of a large number of medium-sized sub-areas (whereas big cities were covered by many smaller sub-areas); 2) the same borders were mapped as precisely as possible, adapting, at the same time, the size of the sub-areas to the actual population density. 3) A high amount of small and equally sized sub-areas was used in order to map NUTS, without considering the population density. In total, we collected more than 300 million tweets, out of which 1% includes geographical metadata (useful to check the accuracy of data collection’s geo-coordinates). Results: The analysis of tweets including geographical metadata reveal that these tweets were actually posted in the expected regions. This leads to the conclusion that the same probably happens for tweets without geographical metadata. Moreover, the strategy of population density-adapted sub-areas has proven to cover the posted tweets in the most accurate way. Added Value: Our findings indicate that, using our second collection strategy, tweets can be correctly assigned to territorial regions, such as cities or country units. Furthermore, we were able to identify an efficient and exhaustive strategy for collecting Twitter data that balances the territorial coverage and the need of dealing with a reasonably sized dataset.
(2019). Optimized Strategies for Enhancing the Territorial Coverage in Twitter Data Collection . Retrieved from http://hdl.handle.net/10446/151749
Optimized Strategies for Enhancing the Territorial Coverage in Twitter Data Collection
Cameletti, Michela;Toninelli, Daniele
2019-01-01
Abstract
Relevance & Research Question: The use of social media as promising data source has become increasingly important in recent years. Social media data, such as tweets, do not only pave the way for new research possibilities, but also raise completely new methodological and substantial questions in a lot of research field (e.g., social sciences, statistics and so forth). This work aims at finding an efficient and optimized way of data collection. In particular, we compare different data collection strategies in collecting Twitter data (for example in order to enhance the territorial coverage of different geographical areas). Methods & Data: For this purpose, we collected Twitter data among the whole United Kingdom for a period of 90 days, implementing three different parallel tweet collection strategies, set as follows: 1) the boarders of the 12 UK territorial regions (NUTS) were precisely mapped by means of a large number of medium-sized sub-areas (whereas big cities were covered by many smaller sub-areas); 2) the same borders were mapped as precisely as possible, adapting, at the same time, the size of the sub-areas to the actual population density. 3) A high amount of small and equally sized sub-areas was used in order to map NUTS, without considering the population density. In total, we collected more than 300 million tweets, out of which 1% includes geographical metadata (useful to check the accuracy of data collection’s geo-coordinates). Results: The analysis of tweets including geographical metadata reveal that these tweets were actually posted in the expected regions. This leads to the conclusion that the same probably happens for tweets without geographical metadata. Moreover, the strategy of population density-adapted sub-areas has proven to cover the posted tweets in the most accurate way. Added Value: Our findings indicate that, using our second collection strategy, tweets can be correctly assigned to territorial regions, such as cities or country units. Furthermore, we were able to identify an efficient and exhaustive strategy for collecting Twitter data that balances the territorial coverage and the need of dealing with a reasonably sized dataset.File | Dimensione del file | Formato | |
---|---|---|---|
GOR19_Proceedings_SS-MC-DT.pdf
Solo gestori di archivio
Descrizione: Abstract proceedings
Versione:
publisher's version - versione editoriale
Licenza:
Licenza default Aisberg
Dimensione del file
3.77 MB
Formato
Adobe PDF
|
3.77 MB | Adobe PDF | Visualizza/Apri |
Schlosser-Optimized_Strategies_for_Enhancing_the_Territorial_Coverage-188.pdf
accesso aperto
Descrizione: Slide presentazione
Versione:
publisher's version - versione editoriale
Licenza:
Creative commons
Dimensione del file
1.34 MB
Formato
Adobe PDF
|
1.34 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo