This study examines the challenges involved in automatically identifying disinformation about abortion on Twitter. We collected 166,180 tweets posted on Twitter (January–December 2022) about the Supreme Court’s reversal of Roe v. Wade to train large language models (LLMs) and machine learning systems to recognize disinformation about abortion. For this purpose, we created a pilot corpus of 8,309 tweets. Surprisingly, only 0.08% of the pilot corpus contained medical disinformation. We thus questioned whether the planned machine learning could be carried out, given that semantic ambiguity regarding what is fake in the abortion debate poses significant hurdles. We observed that tweets expressing extreme viewpoints were often labelled as “fake”, highlighting the subjective nature of such categorizations. These findings, strongly influenced by personal ideological views on the topic in question, emphasize the complexity of using LLMs and machine learning systems to navigate emotionally charged topics. They also stress on the importance of considering different perspectives to reduce bias in analyses, advise caution against relying solely on these technologies, and warn of the problems of potential bias and “cherry-picking” in data interpretation, especially when researching social media debates which are full of implicit content, presuppositions, and implicatures, as well as fallacious argumentation.
(2026). The researcher’s bias in fake news automatic detection: a case study [journal article - articolo]. In LINGUISTICS VANGUARD. Retrieved from https://hdl.handle.net/10446/318485
The researcher’s bias in fake news automatic detection: a case study
Maci, Stefania Maria;Abbiati, Simone
2026-02-04
Abstract
This study examines the challenges involved in automatically identifying disinformation about abortion on Twitter. We collected 166,180 tweets posted on Twitter (January–December 2022) about the Supreme Court’s reversal of Roe v. Wade to train large language models (LLMs) and machine learning systems to recognize disinformation about abortion. For this purpose, we created a pilot corpus of 8,309 tweets. Surprisingly, only 0.08% of the pilot corpus contained medical disinformation. We thus questioned whether the planned machine learning could be carried out, given that semantic ambiguity regarding what is fake in the abortion debate poses significant hurdles. We observed that tweets expressing extreme viewpoints were often labelled as “fake”, highlighting the subjective nature of such categorizations. These findings, strongly influenced by personal ideological views on the topic in question, emphasize the complexity of using LLMs and machine learning systems to navigate emotionally charged topics. They also stress on the importance of considering different perspectives to reduce bias in analyses, advise caution against relying solely on these technologies, and warn of the problems of potential bias and “cherry-picking” in data interpretation, especially when researching social media debates which are full of implicit content, presuppositions, and implicatures, as well as fallacious argumentation.| File | Dimensione del file | Formato | |
|---|---|---|---|
|
10.1515_lingvan-2024-0060.pdf
Solo gestori di archivio
Descrizione: Articolo
Versione:
publisher's version - versione editoriale
Licenza:
Licenza default Aisberg
Dimensione del file
556.34 kB
Formato
Adobe PDF
|
556.34 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

