The researcher’s bias in fake news automatic detection: a case study

This study examines the challenges involved in automatically identifying disinformation about abortion on Twitter. We collected 166,180 tweets posted on Twitter (January–December 2022) about the Supreme Court’s reversal of Roe v. Wade to train large language models (LLMs) and machine learning systems to recognize disinformation about abortion. For this purpose, we created a pilot corpus of 8,309 tweets. Surprisingly, only 0.08% of the pilot corpus contained medical disinformation. We thus questioned whether the planned machine learning could be carried out, given that semantic ambiguity regarding what is fake in the abortion debate poses significant hurdles. We observed that tweets expressing extreme viewpoints were often labelled as “fake”, highlighting the subjective nature of such categorizations. These findings, strongly influenced by personal ideological views on the topic in question, emphasize the complexity of using LLMs and machine learning systems to navigate emotionally charged topics. They also stress on the importance of considering different perspectives to reduce bias in analyses, advise caution against relying solely on these technologies, and warn of the problems of potential bias and “cherry-picking” in data interpretation, especially when researching social media debates which are full of implicit content, presuppositions, and implicatures, as well as fallacious argumentation.

(2026). The researcher’s bias in fake news automatic detection: a case study [journal article - articolo]. In LINGUISTICS VANGUARD. Retrieved from https://hdl.handle.net/10446/318485

The researcher’s bias in fake news automatic detection: a case study

Maci, Stefania Maria;Abbiati, Simone

2026-02-04

Abstract

This study examines the challenges involved in automatically identifying disinformation about abortion on Twitter. We collected 166,180 tweets posted on Twitter (January–December 2022) about the Supreme Court’s reversal of Roe v. Wade to train large language models (LLMs) and machine learning systems to recognize disinformation about abortion. For this purpose, we created a pilot corpus of 8,309 tweets. Surprisingly, only 0.08% of the pilot corpus contained medical disinformation. We thus questioned whether the planned machine learning could be carried out, given that semantic ambiguity regarding what is fake in the abortion debate poses significant hurdles. We observed that tweets expressing extreme viewpoints were often labelled as “fake”, highlighting the subjective nature of such categorizations. These findings, strongly influenced by personal ideological views on the topic in question, emphasize the complexity of using LLMs and machine learning systems to navigate emotionally charged topics. They also stress on the importance of considering different perspectives to reduce bias in analyses, advise caution against relying solely on these technologies, and warn of the problems of potential bias and “cherry-picking” in data interpretation, especially when researching social media debates which are full of implicit content, presuppositions, and implicatures, as well as fallacious argumentation.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di articolo
	
				articolo
			
	Data di pubblicazione
	
				4-feb-2026
			
	Rivista in ANCE
	
				LINGUISTICS VANGUARD
			
	Tutti gli autori
	
						Maci, Stefania Maria; Abbiati, Simone
					
	Citazione
	
				(2026). The researcher’s bias in fake news automatic detection: a case study  [journal article - articolo]. In LINGUISTICS VANGUARD. Retrieved from https://hdl.handle.net/10446/318485
			
	Nelle collezioni:
	
				1.1.01 Articoli/Saggi in rivista - Journal Articles/Essays

File allegato/i alla scheda:

File	Dimensione del file	Formato
10.1515_lingvan-2024-0060.pdf Solo gestori di archivio Descrizione: Articolo Versione: publisher's version - versione editoriale Licenza: Licenza default Aisberg Dimensione del file 556.34 kB Formato Adobe PDF Visualizza/Apri	556.34 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/318485

Citazioni

0

0

social impact