BAT: A Toolkit for Biomedical Text Augmentation

We introduce BAT (Biomedical Augmentation for Text), a Python package specifically designed to augment textual data in the biomedical domain using a neuro-symbolic pipeline. This innovative approach combines knowledge-driven and data-driven methodologies to generate perturbed versions of text while preserving its original meaning. The package provides two categories of functions: Knowledge-based (KB) perturbation and Transformer-based (TB) perturbation. KB perturbation offers a utility interface towards semantic resources for handling medical terminology alongside general-purpose terms, by providing both medical and general synonym replacement. TB perturbation leverages language models to enable generation of new augmented sentences through contextual word prediction, back-translation, and rephrasing. BAT is designed to tackle the typical challenges of biomedical text, navigating complex medical jargon and enriching text while maintaining its readability. It is also designed for modularity, allowing seamless integration into existing NLP workflows and processing of entire datasets, ranging from single words and sentences to large corpora. By integrating formalized domain knowledge with cutting-edge machine learning models, BAT serves as a versatile toolkit for text augmentation across multiple languages, including English as well as low-resources languages such as Italian, Spanish, and French. It facilitates the generation of diverse, high-quality textual data to support a range of biomedical applications, including creating new training samples, addressing imbalanced distributions, and evaluating model robustness.

(2025). BAT: A Toolkit for Biomedical Text Augmentation . Retrieved from https://hdl.handle.net/10446/316346

BAT: A Toolkit for Biomedical Text Augmentation

Bergomi, Laura;Parimbelli, Enea;Pala, Daniele;Buonocore, Tommaso M.

2025-01-01

Abstract

We introduce BAT (Biomedical Augmentation for Text), a Python package specifically designed to augment textual data in the biomedical domain using a neuro-symbolic pipeline. This innovative approach combines knowledge-driven and data-driven methodologies to generate perturbed versions of text while preserving its original meaning. The package provides two categories of functions: Knowledge-based (KB) perturbation and Transformer-based (TB) perturbation. KB perturbation offers a utility interface towards semantic resources for handling medical terminology alongside general-purpose terms, by providing both medical and general synonym replacement. TB perturbation leverages language models to enable generation of new augmented sentences through contextual word prediction, back-translation, and rephrasing. BAT is designed to tackle the typical challenges of biomedical text, navigating complex medical jargon and enriching text while maintaining its readability. It is also designed for modularity, allowing seamless integration into existing NLP workflows and processing of entire datasets, ranging from single words and sentences to large corpora. By integrating formalized domain knowledge with cutting-edge machine learning models, BAT serves as a versatile toolkit for text augmentation across multiple languages, including English as well as low-resources languages such as Italian, Spanish, and French. It facilitates the generation of diverse, high-quality textual data to support a range of biomedical applications, including creating new training samples, addressing imbalanced distributions, and evaluating model robustness.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				2025
			
	Tutti gli autori
	
						Bergomi, Laura; Parimbelli, Enea; Pala, Daniele; Buonocore, Tommaso M.
					
	Nelle collezioni:
	
				1.4.01 Contributi in atti di convegno - Conference presentations

File allegato/i alla scheda:

File	Dimensione del file	Formato
978-3-031-95841-0 (1) (1).pdf Solo gestori di archivio Versione: publisher's version - versione editoriale Licenza: Licenza default Aisberg Dimensione del file 1.22 MB Formato Adobe PDF Visualizza/Apri	1.22 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/316346

Citazioni

1

1

social impact