Uncertainty quantification in multi-class segmentation: Comparison between Bayesian and non-Bayesian approaches in a clinical perspective

Scalco, Elisa; Pozzi, Silvia; Rizzo, Giovanna; Lanzarone, Ettore

doi:10.1002/mp.17189

Background: Automatic segmentation techniques based on Convolutional Neural Networks (CNNs) are widely adopted to automatically identify any structure of interest from a medical image, as they are not time consuming and not subject to high intra- and inter-operator variability. However, the adoption of these approaches in clinical practice is slowed down by some factors, such as the difficulty in providing an accurate quantification of their uncertainty. Purpose: This work aims to evaluate the uncertainty quantification provided by two Bayesian and two non-Bayesian approaches for a multi-class segmentation problem, and to compare the risk propensity among these approaches, considering CT images of patients affected by renal cancer (RC). Methods: Four uncertainty quantification approaches were implemented in this work, based on a benchmark CNN currently employed in medical image segmentation: two Bayesian CNNs with different regularizations (Dropout and DropConnect), named BDR and BDC, an ensemble method (Ens) and a test-time augmentation (TTA) method. They were compared in terms of segmentation accuracy, using the Dice score, uncertainty quantification, using the ratio of correct-certain pixels (RCC) and incorrect-uncertain pixels (RIU), and with respect to inter-observer variability in manual segmentation. They were trained with the Kidney and Kidney Tumor Segmentation Challenge launched in 2021 (Kits21), for which multi-class segmentations of kidney, RC, and cyst on 300 CT volumes are available. Moreover, they were tested considering this and other two public renal CT datasets. Results: Accuracy results achieved large differences across the structures of interest for all approaches, with an average Dice score of 0.92, 0.58, and 0.21 for kidney, tumor, and cyst, respectively. In terms of uncertainties, TTA provided the highest uncertainty, followed by Ens and BDC, whereas BDR provided the lowest, and minimized the number of incorrect certain pixels worse than the other approaches. Again, large differences were seen across the three structures in terms of RCC and RIU. These metrics were associated with different risk propensity, as BDR was the most risk-taking approach, able to provide higher accuracy in its prediction, but failing to assign uncertainty on incorrect segmentation in every case. The other three approaches were more conservative, providing large uncertainty regions, with the drawback of giving alert also on correct areas. Finally, the analysis of the inter-observer segmentation variability showed a significant variation among the four approaches on the external dataset, with BDR reporting the lowest agreement (Dice = 0.82), and TTA obtaining the highest score (Dice = 0.94). Conclusions: Our outcomes highlight the importance of quantifying the segmentation uncertainty and that decision-makers can choose the approach most in line with the risk propensity degree required by the application and their policy.

(2024). Uncertainty quantification in multi-class segmentation: Comparison between Bayesian and non-Bayesian approaches in a clinical perspective [journal article - articolo]. In MEDICAL PHYSICS. Retrieved from https://hdl.handle.net/10446/271389

Uncertainty quantification in multi-class segmentation: Comparison between Bayesian and non-Bayesian approaches in a clinical perspective

Scalco, Elisa;Pozzi, Silvia;Rizzo, Giovanna;Lanzarone, Ettore

2024-01-01

Abstract

Background: Automatic segmentation techniques based on Convolutional Neural Networks (CNNs) are widely adopted to automatically identify any structure of interest from a medical image, as they are not time consuming and not subject to high intra- and inter-operator variability. However, the adoption of these approaches in clinical practice is slowed down by some factors, such as the difficulty in providing an accurate quantification of their uncertainty. Purpose: This work aims to evaluate the uncertainty quantification provided by two Bayesian and two non-Bayesian approaches for a multi-class segmentation problem, and to compare the risk propensity among these approaches, considering CT images of patients affected by renal cancer (RC). Methods: Four uncertainty quantification approaches were implemented in this work, based on a benchmark CNN currently employed in medical image segmentation: two Bayesian CNNs with different regularizations (Dropout and DropConnect), named BDR and BDC, an ensemble method (Ens) and a test-time augmentation (TTA) method. They were compared in terms of segmentation accuracy, using the Dice score, uncertainty quantification, using the ratio of correct-certain pixels (RCC) and incorrect-uncertain pixels (RIU), and with respect to inter-observer variability in manual segmentation. They were trained with the Kidney and Kidney Tumor Segmentation Challenge launched in 2021 (Kits21), for which multi-class segmentations of kidney, RC, and cyst on 300 CT volumes are available. Moreover, they were tested considering this and other two public renal CT datasets. Results: Accuracy results achieved large differences across the structures of interest for all approaches, with an average Dice score of 0.92, 0.58, and 0.21 for kidney, tumor, and cyst, respectively. In terms of uncertainties, TTA provided the highest uncertainty, followed by Ens and BDC, whereas BDR provided the lowest, and minimized the number of incorrect certain pixels worse than the other approaches. Again, large differences were seen across the three structures in terms of RCC and RIU. These metrics were associated with different risk propensity, as BDR was the most risk-taking approach, able to provide higher accuracy in its prediction, but failing to assign uncertainty on incorrect segmentation in every case. The other three approaches were more conservative, providing large uncertainty regions, with the drawback of giving alert also on correct areas. Finally, the analysis of the inter-observer segmentation variability showed a significant variation among the four approaches on the external dataset, with BDR reporting the lowest agreement (Dice = 0.82), and TTA obtaining the highest score (Dice = 0.94). Conclusions: Our outcomes highlight the importance of quantifying the segmentation uncertainty and that decision-makers can choose the approach most in line with the risk propensity degree required by the application and their policy.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di articolo
	
				articolo
			
	Data di pubblicazione
	
				2024
			
	Rivista in ANCE
	
				MEDICAL PHYSICS
			
	Tutti gli autori
	
						Scalco, Elisa; Pozzi, Silvia; Rizzo, Giovanna; Lanzarone, Ettore
					
	Citazione
	
				(2024). Uncertainty quantification in multi-class segmentation: Comparison between Bayesian and non-Bayesian approaches in a clinical perspective  [journal article - articolo]. In MEDICAL PHYSICS. Retrieved from https://hdl.handle.net/10446/271389
			
	Nelle collezioni:
	
				1.1.01 Articoli/Saggi in rivista - Journal Articles/Essays

File allegato/i alla scheda:

File	Dimensione del file	Formato
Medical Physics - 2024 - Scalco - Uncertainty quantification in multiclass segmentation Comparison between Bayesian and.pdf accesso aperto Versione: publisher's version - versione editoriale Licenza: Creative commons Dimensione del file 1.96 MB Formato Adobe PDF Visualizza/Apri	1.96 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/271389

Citazioni

0

0

Uncertainty quantification in multi-class segmentation: Comparison between Bayesian and non-Bayesian approaches in a clinical perspective

Scalco, Elisa;Pozzi, Silvia;Rizzo, Giovanna;Lanzarone, Ettore

2024-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)