A Methodology for Controlling Bias and Fairness in Synthetic Data Generation

The development of algorithms, based on machine learning techniques, supporting (or even replacing) human judgment must take into account concepts such as data bias and fairness. Though scientific literature proposes numerous techniques to detect and evaluate these problems, less attention has been dedicated to methods generating intentionally biased datasets, which could be used by data scientists to develop and validate unbiased and fair decision-making algorithms. To this end, this paper presents a novel method to generate a synthetic dataset, where bias can be modeled by using a probabilistic network exploiting structural equation modeling. The proposed methodology has been validated on a simple dataset to highlight the impact of tuning parameters on bias and fairness, as well as on a more realistic example based on a loan approval status dataset. In particular, this methodology requires a limited number of parameters compared to other techniques for generating datasets with a controlled amount of bias and fairness.

(2022). A Methodology for Controlling Bias and Fairness in Synthetic Data Generation [journal article - articolo]. In APPLIED SCIENCES. Retrieved from http://hdl.handle.net/10446/215988

A Methodology for Controlling Bias and Fairness in Synthetic Data Generation

Barbierato, Enrico;Della Vedova, Marco L.;Tessera, Daniele;Toti, Daniele;Vanoli, Nicola

2022-01-01

Abstract

The development of algorithms, based on machine learning techniques, supporting (or even replacing) human judgment must take into account concepts such as data bias and fairness. Though scientific literature proposes numerous techniques to detect and evaluate these problems, less attention has been dedicated to methods generating intentionally biased datasets, which could be used by data scientists to develop and validate unbiased and fair decision-making algorithms. To this end, this paper presents a novel method to generate a synthetic dataset, where bias can be modeled by using a probabilistic network exploiting structural equation modeling. The proposed methodology has been validated on a simple dataset to highlight the impact of tuning parameters on bias and fairness, as well as on a more realistic example based on a loan approval status dataset. In particular, this methodology requires a limited number of parameters compared to other techniques for generating datasets with a controlled amount of bias and fairness.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di articolo
	
				articolo
			
	Data di pubblicazione
	
				2022
			
	Rivista in ANCE
	
				APPLIED SCIENCES
			
	Tutti gli autori
	
						Barbierato, Enrico; DELLA VEDOVA, Marco Luigi; Tessera, Daniele; Toti, Daniele; Vanoli, Nicola
					
	Citazione
	
				(2022). A Methodology for Controlling Bias and Fairness in Synthetic Data Generation  [journal article - articolo]. In APPLIED SCIENCES. Retrieved from http://hdl.handle.net/10446/215988
			
	Nelle collezioni:
	
				1.1.01 Articoli/Saggi in rivista - Journal Articles/Essays

File allegato/i alla scheda:

File	Dimensione del file	Formato
applsci-12-04619-v2.pdf accesso aperto Versione: publisher's version - versione editoriale Licenza: Creative commons Dimensione del file 7.48 MB Formato Adobe PDF Visualizza/Apri	7.48 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/215988

Citazioni

27

17

social impact