The development of algorithms, based on machine learning techniques, supporting (or even replacing) human judgment must take into account concepts such as data bias and fairness. Though scientific literature proposes numerous techniques to detect and evaluate these problems, less attention has been dedicated to methods generating intentionally biased datasets, which could be used by data scientists to develop and validate unbiased and fair decision-making algorithms. To this end, this paper presents a novel method to generate a synthetic dataset, where bias can be modeled by using a probabilistic network exploiting structural equation modeling. The proposed methodology has been validated on a simple dataset to highlight the impact of tuning parameters on bias and fairness, as well as on a more realistic example based on a loan approval status dataset. In particular, this methodology requires a limited number of parameters compared to other techniques for generating datasets with a controlled amount of bias and fairness.

(2022). A Methodology for Controlling Bias and Fairness in Synthetic Data Generation [journal article - articolo]. In APPLIED SCIENCES. Retrieved from http://hdl.handle.net/10446/215988

A Methodology for Controlling Bias and Fairness in Synthetic Data Generation

Della Vedova, Marco L.;
2022-01-01

Abstract

The development of algorithms, based on machine learning techniques, supporting (or even replacing) human judgment must take into account concepts such as data bias and fairness. Though scientific literature proposes numerous techniques to detect and evaluate these problems, less attention has been dedicated to methods generating intentionally biased datasets, which could be used by data scientists to develop and validate unbiased and fair decision-making algorithms. To this end, this paper presents a novel method to generate a synthetic dataset, where bias can be modeled by using a probabilistic network exploiting structural equation modeling. The proposed methodology has been validated on a simple dataset to highlight the impact of tuning parameters on bias and fairness, as well as on a more realistic example based on a loan approval status dataset. In particular, this methodology requires a limited number of parameters compared to other techniques for generating datasets with a controlled amount of bias and fairness.
articolo
2022
Barbierato, Enrico; DELLA VEDOVA, Marco Luigi; Tessera, Daniele; Toti, Daniele; Vanoli, Nicola
(2022). A Methodology for Controlling Bias and Fairness in Synthetic Data Generation [journal article - articolo]. In APPLIED SCIENCES. Retrieved from http://hdl.handle.net/10446/215988
File allegato/i alla scheda:
File Dimensione del file Formato  
applsci-12-04619-v2.pdf

accesso aperto

Versione: publisher's version - versione editoriale
Licenza: Creative commons
Dimensione del file 7.48 MB
Formato Adobe PDF
7.48 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/215988
Citazioni
  • Scopus 27
  • ???jsp.display-item.citation.isi??? 16
social impact