Effective electrical load forecasting is based on the quality of historical data and the efficiency of forecasting algorithms. However, the presence of the missing data, due to sensor errors, communication failures, and data processing anomalies, is one of the significant problem, which not only compromising the integrity of the dataset but also reduces the accuracy of forecasting. Machine learning (ML) based imputation techniques are significant in addressing this issue by estimating and substituting the missing values based on the inherent correlations present within the dataset. In this study, four ML based imputation approaches, i.e., Random Forest (RF), Support Vector Regression (SVR), K-Nearest Neighbors (KNN) and Extreme Gradient Boosting (XGBoost), are applied to enhance the accuracy and reliability of the electrical load forecasting. A synthetic linear missing data pattern is introduced into the original dataset, and these imputation methods are evaluated for their effectiveness in restoring data integrity. This task is achieved by integrating the imputed datasets into two deep learning (DL) forecasting frameworks: Recurrent Neural Network (RNN) and Gated Recurrent Unit (GRU). The predictive performance is measured through metric parameters including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R-squared (R2), along with an analysis of computational efficiency. The comparative study between DL structures indicates that the RNN requires less computational time, although the GRU consistently delivers superior forecasting accuracy across all imputation methods. Considering the evaluated imputation techniques, the XGBoost perform better at the lowest MSE with 6% missing data (894.98 with RNN; 876.62 with GRU), while the RF is the most consistent, particularly, at higher missing data rates (MSE: 1259.17 at 30% missingness). These findings highlight the critical significance of selecting suitable imputation techniques to enhance load forecasting efficacy in practical applications.
(2025). Machine Learning-Based Imputation Approaches for Efficient Electrical Load Forecasting . Retrieved from https://hdl.handle.net/10446/313065
Machine Learning-Based Imputation Approaches for Efficient Electrical Load Forecasting
Hussain, Ayaz;Giangrande, Paolo;Franchini, Giuseppe;
2025-01-01
Abstract
Effective electrical load forecasting is based on the quality of historical data and the efficiency of forecasting algorithms. However, the presence of the missing data, due to sensor errors, communication failures, and data processing anomalies, is one of the significant problem, which not only compromising the integrity of the dataset but also reduces the accuracy of forecasting. Machine learning (ML) based imputation techniques are significant in addressing this issue by estimating and substituting the missing values based on the inherent correlations present within the dataset. In this study, four ML based imputation approaches, i.e., Random Forest (RF), Support Vector Regression (SVR), K-Nearest Neighbors (KNN) and Extreme Gradient Boosting (XGBoost), are applied to enhance the accuracy and reliability of the electrical load forecasting. A synthetic linear missing data pattern is introduced into the original dataset, and these imputation methods are evaluated for their effectiveness in restoring data integrity. This task is achieved by integrating the imputed datasets into two deep learning (DL) forecasting frameworks: Recurrent Neural Network (RNN) and Gated Recurrent Unit (GRU). The predictive performance is measured through metric parameters including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R-squared (R2), along with an analysis of computational efficiency. The comparative study between DL structures indicates that the RNN requires less computational time, although the GRU consistently delivers superior forecasting accuracy across all imputation methods. Considering the evaluated imputation techniques, the XGBoost perform better at the lowest MSE with 6% missing data (894.98 with RNN; 876.62 with GRU), while the RF is the most consistent, particularly, at higher missing data rates (MSE: 1259.17 at 30% missingness). These findings highlight the critical significance of selecting suitable imputation techniques to enhance load forecasting efficacy in practical applications.| File | Dimensione del file | Formato | |
|---|---|---|---|
|
C103_merged.pdf
Solo gestori di archivio
Versione:
publisher's version - versione editoriale
Licenza:
Licenza default Aisberg
Dimensione del file
1.56 MB
Formato
Adobe PDF
|
1.56 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

