Scalable Distributed Data Anonymization for Large Datasets

k-Anonymity and l-diversity are two well-known privacy metrics that guarantee protection of the respondents of a dataset by obfuscating information that can disclose their identities and sensitive information. Existing solutions for enforcing them implicitly assume to operate in a centralized scenario, since they require complete visibility over the dataset to be anonymized, and can therefore have limited applicability in anonymizing large datasets. In this paper, we propose a solution that extends Mondrian (an efficient and effective approach designed for achieving k-anonymity) for enforcing both k-anonymity and l-diversity over large datasets in a distributed manner, leveraging the parallel computation of multiple workers. Our approach efficiently distributes the computation among the workers, without requiring visibility over the dataset in its entirety. Our data partitioning limits the need for workers to exchange data, so that each worker can independently anonymize a portion of the dataset. We implemented our approach providing parallel execution on a dynamically chosen number of workers. The experimental evaluation shows that our solution provides scalability, while not affecting the quality of the resulting anonymization.

(2023). Scalable Distributed Data Anonymization for Large Datasets [journal article - articolo]. In IEEE TRANSACTIONS ON BIG DATA. Retrieved from https://hdl.handle.net/10446/236234

Scalable Distributed Data Anonymization for Large Datasets

De Capitani di Vimercati, Sabrina;Facchinetti, Dario;Foresti, Sara;Livraga, Giovanni;Oldani, Gianluca;Paraboschi, Stefano;Rossi, Matthew;Samarati, Pierangela

2023-01-01

Abstract

k-Anonymity and l-diversity are two well-known privacy metrics that guarantee protection of the respondents of a dataset by obfuscating information that can disclose their identities and sensitive information. Existing solutions for enforcing them implicitly assume to operate in a centralized scenario, since they require complete visibility over the dataset to be anonymized, and can therefore have limited applicability in anonymizing large datasets. In this paper, we propose a solution that extends Mondrian (an efficient and effective approach designed for achieving k-anonymity) for enforcing both k-anonymity and l-diversity over large datasets in a distributed manner, leveraging the parallel computation of multiple workers. Our approach efficiently distributes the computation among the workers, without requiring visibility over the dataset in its entirety. Our data partitioning limits the need for workers to exchange data, so that each worker can independently anonymize a portion of the dataset. We implemented our approach providing parallel execution on a dynamically chosen number of workers. The experimental evaluation shows that our solution provides scalability, while not affecting the quality of the resulting anonymization.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di articolo
	
				articolo
			
	Data di pubblicazione
	
				2023
			
	Rivista in ANCE
	
				IEEE TRANSACTIONS ON BIG DATA
			
	Tutti gli autori
	
						De Capitani di Vimercati, Sabrina; Facchinetti, Dario; Foresti, Sara; Livraga, Giovanni; Oldani, Gianluca; Paraboschi, Stefano; Rossi, Matthew; Samara...espandi
						
	Citazione
	
				(2023). Scalable Distributed Data Anonymization for Large Datasets  [journal article - articolo]. In IEEE TRANSACTIONS ON BIG DATA. Retrieved from https://hdl.handle.net/10446/236234
			
	Nelle collezioni:
	
				1.1.01 Articoli/Saggi in rivista - Journal Articles/Essays

File allegato/i alla scheda:

File	Dimensione del file	Formato
Scalable_Distributed_Data_Anonymization_for_Large_Datasets.pdf accesso aperto Versione: publisher's version - versione editoriale Licenza: Creative commons Dimensione del file 802.67 kB Formato Adobe PDF Visualizza/Apri	802.67 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/236234

Citazioni

13

7

social impact