This study examines the challenges involved in automatically identifying disinformation about abortion on Twitter. We collected 166,180 tweets posted on Twitter (January–December 2022) about the Supreme Court’s reversal of Roe v. Wade to train large language models (LLMs) and machine learning systems to recognize disinformation about abortion. For this purpose, we created a pilot corpus of 8,309 tweets. Surprisingly, only 0.08% of the pilot corpus contained medical disinformation. We thus questioned whether the planned machine learning could be carried out, given that semantic ambiguity regarding what is fake in the abortion debate poses significant hurdles. We observed that tweets expressing extreme viewpoints were often labelled as “fake”, highlighting the subjective nature of such categorizations. These findings, strongly influenced by personal ideological views on the topic in question, emphasize the complexity of using LLMs and machine learning systems to navigate emotionally charged topics. They also stress on the importance of considering different perspectives to reduce bias in analyses, advise caution against relying solely on these technologies, and warn of the problems of potential bias and “cherry-picking” in data interpretation, especially when researching social media debates which are full of implicit content, presuppositions, and implicatures, as well as fallacious argumentation.

(2026). The researcher’s bias in fake news automatic detection: a case study [journal article - articolo]. In LINGUISTICS VANGUARD. Retrieved from https://hdl.handle.net/10446/318485

The researcher’s bias in fake news automatic detection: a case study

Maci, Stefania Maria;Abbiati, Simone
2026-02-04

Abstract

This study examines the challenges involved in automatically identifying disinformation about abortion on Twitter. We collected 166,180 tweets posted on Twitter (January–December 2022) about the Supreme Court’s reversal of Roe v. Wade to train large language models (LLMs) and machine learning systems to recognize disinformation about abortion. For this purpose, we created a pilot corpus of 8,309 tweets. Surprisingly, only 0.08% of the pilot corpus contained medical disinformation. We thus questioned whether the planned machine learning could be carried out, given that semantic ambiguity regarding what is fake in the abortion debate poses significant hurdles. We observed that tweets expressing extreme viewpoints were often labelled as “fake”, highlighting the subjective nature of such categorizations. These findings, strongly influenced by personal ideological views on the topic in question, emphasize the complexity of using LLMs and machine learning systems to navigate emotionally charged topics. They also stress on the importance of considering different perspectives to reduce bias in analyses, advise caution against relying solely on these technologies, and warn of the problems of potential bias and “cherry-picking” in data interpretation, especially when researching social media debates which are full of implicit content, presuppositions, and implicatures, as well as fallacious argumentation.
articolo
4-feb-2026
Maci, Stefania Maria; Abbiati, Simone
(2026). The researcher’s bias in fake news automatic detection: a case study [journal article - articolo]. In LINGUISTICS VANGUARD. Retrieved from https://hdl.handle.net/10446/318485
File allegato/i alla scheda:
File Dimensione del file Formato  
10.1515_lingvan-2024-0060.pdf

Solo gestori di archivio

Descrizione: Articolo
Versione: publisher's version - versione editoriale
Licenza: Licenza default Aisberg
Dimensione del file 556.34 kB
Formato Adobe PDF
556.34 kB Adobe PDF   Visualizza/Apri
Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/318485
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact