Machine Learning-Based Imputation Approaches for Efficient Electrical Load Forecasting

Effective electrical load forecasting is based on the quality of historical data and the efficiency of forecasting algorithms. However, the presence of the missing data, due to sensor errors, communication failures, and data processing anomalies, is one of the significant problem, which not only compromising the integrity of the dataset but also reduces the accuracy of forecasting. Machine learning (ML) based imputation techniques are significant in addressing this issue by estimating and substituting the missing values based on the inherent correlations present within the dataset. In this study, four ML based imputation approaches, i.e., Random Forest (RF), Support Vector Regression (SVR), K-Nearest Neighbors (KNN) and Extreme Gradient Boosting (XGBoost), are applied to enhance the accuracy and reliability of the electrical load forecasting. A synthetic linear missing data pattern is introduced into the original dataset, and these imputation methods are evaluated for their effectiveness in restoring data integrity. This task is achieved by integrating the imputed datasets into two deep learning (DL) forecasting frameworks: Recurrent Neural Network (RNN) and Gated Recurrent Unit (GRU). The predictive performance is measured through metric parameters including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R-squared (R2), along with an analysis of computational efficiency. The comparative study between DL structures indicates that the RNN requires less computational time, although the GRU consistently delivers superior forecasting accuracy across all imputation methods. Considering the evaluated imputation techniques, the XGBoost perform better at the lowest MSE with 6% missing data (894.98 with RNN; 876.62 with GRU), while the RF is the most consistent, particularly, at higher missing data rates (MSE: 1259.17 at 30% missingness). These findings highlight the critical significance of selecting suitable imputation techniques to enhance load forecasting efficacy in practical applications.

(2025). Machine Learning-Based Imputation Approaches for Efficient Electrical Load Forecasting . Retrieved from https://hdl.handle.net/10446/313065

Machine Learning-Based Imputation Approaches for Efficient Electrical Load Forecasting

Hussain, Ayaz;Giangrande, Paolo;Franchini, Giuseppe;Fenili, Lorenzo;Messi, Silvio

2025-01-01

Abstract

Effective electrical load forecasting is based on the quality of historical data and the efficiency of forecasting algorithms. However, the presence of the missing data, due to sensor errors, communication failures, and data processing anomalies, is one of the significant problem, which not only compromising the integrity of the dataset but also reduces the accuracy of forecasting. Machine learning (ML) based imputation techniques are significant in addressing this issue by estimating and substituting the missing values based on the inherent correlations present within the dataset. In this study, four ML based imputation approaches, i.e., Random Forest (RF), Support Vector Regression (SVR), K-Nearest Neighbors (KNN) and Extreme Gradient Boosting (XGBoost), are applied to enhance the accuracy and reliability of the electrical load forecasting. A synthetic linear missing data pattern is introduced into the original dataset, and these imputation methods are evaluated for their effectiveness in restoring data integrity. This task is achieved by integrating the imputed datasets into two deep learning (DL) forecasting frameworks: Recurrent Neural Network (RNN) and Gated Recurrent Unit (GRU). The predictive performance is measured through metric parameters including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R-squared (R2), along with an analysis of computational efficiency. The comparative study between DL structures indicates that the RNN requires less computational time, although the GRU consistently delivers superior forecasting accuracy across all imputation methods. Considering the evaluated imputation techniques, the XGBoost perform better at the lowest MSE with 6% missing data (894.98 with RNN; 876.62 with GRU), while the RF is the most consistent, particularly, at higher missing data rates (MSE: 1259.17 at 30% missingness). These findings highlight the critical significance of selecting suitable imputation techniques to enhance load forecasting efficacy in practical applications.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				2025
			
	Tutti gli autori
	
						Hussain, Ayaz; Giangrande, Paolo; Franchini, Giuseppe; Fenili, Lorenzo; Messi, Silvio
					
	Nelle collezioni:
	
				1.4.01 Contributi in atti di convegno - Conference presentations

File allegato/i alla scheda:

File	Dimensione del file	Formato
C103_merged.pdf Solo gestori di archivio Versione: publisher's version - versione editoriale Licenza: Licenza default Aisberg Dimensione del file 1.56 MB Formato Adobe PDF Visualizza/Apri	1.56 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/313065

Citazioni

0

ND

social impact