In the era of social media, the huge availability of big data such as digital data (e.g. posts sent through social networks or unstructured data scraped from websites) allows to develop new types of research in a wide range of fields. These types of big data are available for low costs and in almost real-time. Nevertheless, their collection and analysis are challenging. This paper proposes an unsupervised dictionary-based method to filter tweets related to a specific topic, i.e. environment. We start from the tweets sent by a selection of Official Social Accounts clearly linked with the subject of interest. Then, we identify a list of expressions (bigrams, trigrams and hashtags) used to set the topic-oriented dictionary. Our approach has some relevant advantages: it attempts to reduce as much as possible the interventions and decisions of the researcher as well as the processing time; it is based mostly on combination of words (instead of single words) in order to ease the identification of tweets concerning the topic of interest; it is not based on a pre-defined dictionary, but it can rather be personalized and generalized to other topics. We test the performance of our method by applying the built dictionary to a sample of more than 3.5 million geolocated tweets posted in Great Britain between January and May 2019. All the criteria used to evaluate the performance highlighted very good performances. In particular, the level of accuracy, of sensitivity and of the F1 score were equal or higher than 98.4%; moreover, also for specificity and precision we obtain excellent levels of performance (around 97,5%), higher than the currently most common methods of selection.

(2022). Dictionary-based Classification of Tweets About Environment [journal article - articolo]. In JOURNAL OF MATHEMATICS AND STATISTICAL SCIENCE. Retrieved from http://hdl.handle.net/10446/203283

Dictionary-based Classification of Tweets About Environment

Cameletti, Michela;Fabris, Silvia;Schlosser, Stephan;Toninelli, Daniele
2022-01-01

Abstract

In the era of social media, the huge availability of big data such as digital data (e.g. posts sent through social networks or unstructured data scraped from websites) allows to develop new types of research in a wide range of fields. These types of big data are available for low costs and in almost real-time. Nevertheless, their collection and analysis are challenging. This paper proposes an unsupervised dictionary-based method to filter tweets related to a specific topic, i.e. environment. We start from the tweets sent by a selection of Official Social Accounts clearly linked with the subject of interest. Then, we identify a list of expressions (bigrams, trigrams and hashtags) used to set the topic-oriented dictionary. Our approach has some relevant advantages: it attempts to reduce as much as possible the interventions and decisions of the researcher as well as the processing time; it is based mostly on combination of words (instead of single words) in order to ease the identification of tweets concerning the topic of interest; it is not based on a pre-defined dictionary, but it can rather be personalized and generalized to other topics. We test the performance of our method by applying the built dictionary to a sample of more than 3.5 million geolocated tweets posted in Great Britain between January and May 2019. All the criteria used to evaluate the performance highlighted very good performances. In particular, the level of accuracy, of sensitivity and of the F1 score were equal or higher than 98.4%; moreover, also for specificity and precision we obtain excellent levels of performance (around 97,5%), higher than the currently most common methods of selection.
articolo
2022
Cameletti, Michela; Fabris, Silvia; Schlosser, Stephan Heinrich; Toninelli, Daniele
(2022). Dictionary-based Classification of Tweets About Environment [journal article - articolo]. In JOURNAL OF MATHEMATICS AND STATISTICAL SCIENCE. Retrieved from http://hdl.handle.net/10446/203283
File allegato/i alla scheda:
File Dimensione del file Formato  
dictionarybased.pdf

accesso aperto

Versione: publisher's version - versione editoriale
Licenza: Creative commons
Dimensione del file 364.4 kB
Formato Adobe PDF
364.4 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/203283
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact