Effective electrical load forecasting is based on the quality of historical data and the efficiency of forecasting algorithms. However, the presence of the missing data, due to sensor errors, communication failures, and data processing anomalies, is one of the significant problem, which not only compromising the integrity of the dataset but also reduces the accuracy of forecasting. Machine learning (ML) based imputation techniques are significant in addressing this issue by estimating and substituting the missing values based on the inherent correlations present within the dataset. In this study, four ML based imputation approaches, i.e., Random Forest (RF), Support Vector Regression (SVR), K-Nearest Neighbors (KNN) and Extreme Gradient Boosting (XGBoost), are applied to enhance the accuracy and reliability of the electrical load forecasting. A synthetic linear missing data pattern is introduced into the original dataset, and these imputation methods are evaluated for their effectiveness in restoring data integrity. This task is achieved by integrating the imputed datasets into two deep learning (DL) forecasting frameworks: Recurrent Neural Network (RNN) and Gated Recurrent Unit (GRU). The predictive performance is measured through metric parameters including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R-squared (R2), along with an analysis of computational efficiency. The comparative study between DL structures indicates that the RNN requires less computational time, although the GRU consistently delivers superior forecasting accuracy across all imputation methods. Considering the evaluated imputation techniques, the XGBoost perform better at the lowest MSE with 6% missing data (894.98 with RNN; 876.62 with GRU), while the RF is the most consistent, particularly, at higher missing data rates (MSE: 1259.17 at 30% missingness). These findings highlight the critical significance of selecting suitable imputation techniques to enhance load forecasting efficacy in practical applications.

(2025). Machine Learning-Based Imputation Approaches for Efficient Electrical Load Forecasting . Retrieved from https://hdl.handle.net/10446/313065

Machine Learning-Based Imputation Approaches for Efficient Electrical Load Forecasting

Hussain, Ayaz;Giangrande, Paolo;Franchini, Giuseppe;
2025-01-01

Abstract

Effective electrical load forecasting is based on the quality of historical data and the efficiency of forecasting algorithms. However, the presence of the missing data, due to sensor errors, communication failures, and data processing anomalies, is one of the significant problem, which not only compromising the integrity of the dataset but also reduces the accuracy of forecasting. Machine learning (ML) based imputation techniques are significant in addressing this issue by estimating and substituting the missing values based on the inherent correlations present within the dataset. In this study, four ML based imputation approaches, i.e., Random Forest (RF), Support Vector Regression (SVR), K-Nearest Neighbors (KNN) and Extreme Gradient Boosting (XGBoost), are applied to enhance the accuracy and reliability of the electrical load forecasting. A synthetic linear missing data pattern is introduced into the original dataset, and these imputation methods are evaluated for their effectiveness in restoring data integrity. This task is achieved by integrating the imputed datasets into two deep learning (DL) forecasting frameworks: Recurrent Neural Network (RNN) and Gated Recurrent Unit (GRU). The predictive performance is measured through metric parameters including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R-squared (R2), along with an analysis of computational efficiency. The comparative study between DL structures indicates that the RNN requires less computational time, although the GRU consistently delivers superior forecasting accuracy across all imputation methods. Considering the evaluated imputation techniques, the XGBoost perform better at the lowest MSE with 6% missing data (894.98 with RNN; 876.62 with GRU), while the RF is the most consistent, particularly, at higher missing data rates (MSE: 1259.17 at 30% missingness). These findings highlight the critical significance of selecting suitable imputation techniques to enhance load forecasting efficacy in practical applications.
2025
Hussain, Ayaz; Giangrande, Paolo; Franchini, Giuseppe; Fenili, Lorenzo; Messi, Silvio
File allegato/i alla scheda:
File Dimensione del file Formato  
C103_merged.pdf

Solo gestori di archivio

Versione: publisher's version - versione editoriale
Licenza: Licenza default Aisberg
Dimensione del file 1.56 MB
Formato Adobe PDF
1.56 MB Adobe PDF   Visualizza/Apri
Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/313065
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact