Methods for Analyzing Electricity Consumption Data and their Implementation in Multi-Platform Architectures

Hussain, Ayaz

doi:10.13122/hussain-ayaz_phd2026-04-23

The evolving digital landscape of smart infrastructure has facilitated the widespread integration of Internet of things (IoT) enabled metering applications, robust monitoring solutions, and cloud oriented analytical systems. The progress in technology generates large volume of reliable data that are necessary for the forecast based energy management of electrical energy consumption. However, data driven datasets often having limitations in data completeness due to sensor failures, interruptions in communication, and constraints associated with hardware, which considerably compromise the efficacy of Mid Term Load Forecasting (MTLF) models. This doctoral research endeavors to tackle these obstacles by proposing a comprehensive methodological framework that missing-data modeling, cutting-edge imputation methodologies, Machine Learning (ML), Deep Learning (DL), and hybrid ML-DL forecasting techniques specifically optimized for the energy analytics of smart buildings. The dissertation is based around four interrelated research analyses: two concentrated on analyzing and improving MTLF models, while the other two were focused on various evaluations of incomplete data patterns and imputation techniques' impacts on future forecasting accuracy. The empirical basis for this work is data from a 30,000 m2 smart commercial building in the north of Italy, as well as meteorological conditions from NASA POWER and Photovoltaic Geographical Information System (PVGIS). Missing data mechanisms like linear block missingness and random pointwise missingness were systematically introduced at different levels (5%-40%) and each result requires a thorough statistical, ML, DL and hybrid imputation evaluation. The results indicated that hybrid imputation methods consistently produce lower reconstruction errors and improve subsequent forecasting stability in contrast to traditional approaches. A detailed analysis of an extensive range of forecasting models is performed, incorporating Random Forest, XGBoost, Support Vector Regression (SVR), Decision Trees, Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and hybrid frameworks like FireNet–XGBoost. Among these methodologies, the combination of hybrid ML and DL frameworks illustrates the most significant predictive performance, especially in the context of imputed datasets. This analysis interprets a distinctive relationship between the quality of imputation, the representation of temporal variables, and the resilience of forecasting strategies, thus offering critical findings for the realization of reliable data-supported energy management initiatives. Besides the methodological contributions, this thesis also presents a comprehensive software implementation which is an interactive MTLF platform developed with Streamlit, Python and modern ML/DL libraries. The proposed framework enables end-to-end forecasting can be achieved, i.e., data upload, data preprocessing, outlier elimination, imputation, model training and performance assessment along with advanced forecasting modules-all conveniently accessible through its web user interface. In addition simplifying model selection \& hyperparameter optimization in AutoML frameworks, an integrated CO2 emissions analytics feature can provide crucial information about sustainability. This software application shows effectively the practical validity of the research in a way by bridging academic model design approaches with real world decision support tools. In summary, this dissertation contributes towards an improved knowledge of smart-building forecasting by offering a thoroughly validated methodological framework, emphasizing the crucial role played by high-quality imputation and also making available a practical forecast platform for operational planning, resource allocation, optimization and sustainability-oriented decisions in modern intelligent buildings.

L'evoluzione del panorama digitale delle infrastrutture intelligenti ha facilitato l'ampia integrazione di applicazioni di misurazione basate sull'Internet delle cose (IoT), soluzioni di monitoraggio affidabili e sistemi analitici orientati al cloud. Il progresso tecnologico genera un ampio volume di dati affidabili, necessari per la gestione energetica basata sulle previsioni dei consumi di energia elettrica. Tuttavia, i set di dati basati sui dati presentano spesso limitazioni nella completezza dei dati dovute a guasti dei sensori, interruzioni della comunicazione e vincoli associati all'hardware, che compromettono notevolmente l'efficacia dei modelli di previsione del carico a medio termine (MTLF). Questa ricerca di dottorato si propone di affrontare questi ostacoli proponendo un quadro metodologico completo che integra la modellazione dei dati mancanti, metodologie di imputazione all'avanguardia, Machine Learning (ML), Deep Learning (DL) e tecniche di previsione ibride ML-DL specificamente ottimizzate per l'analisi energetica degli edifici intelligenti. La tesi si basa su quattro analisi di ricerca interconnesse: due si sono concentrate sull'analisi e il miglioramento dei modelli MTLF, mentre le altre due si sono concentrate su varie valutazioni di pattern di dati incompleti e sull'impatto delle tecniche di imputazione sull'accuratezza delle previsioni future. La base empirica di questo lavoro è costituita dai dati di un edificio commerciale intelligente di 30.000 m² nel nord Italia, nonché dalle condizioni meteorologiche di NASA POWER e del Photovoltaic Geographical Information System (PVGIS). Meccanismi di dati mancanti, come la mancanza di blocchi lineari e la mancanza di punti casuali, sono stati introdotti sistematicamente a diversi livelli (5%-40%) e ogni risultato richiede un'accurata valutazione statistica, di ML, DL e di imputazione ibrida. I risultati hanno indicato che i metodi di imputazione ibrida producono costantemente errori di ricostruzione inferiori e migliorano la stabilità delle previsioni successive rispetto agli approcci tradizionali. Viene eseguita un'analisi dettagliata di un'ampia gamma di modelli di previsione, che includono Random Forest, XGBoost, Support Vector Regression (SVR), Decision Trees, Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU) e framework ibridi come FireNet–XGBoost. Tra queste metodologie, la combinazione di framework ibridi di ML e DL mostra le prestazioni predittive più significative, soprattutto nel contesto di set di dati imputati. Questa analisi interpreta una relazione distintiva tra la qualità dell'imputazione, la rappresentazione delle variabili temporali e la resilienza delle strategie di previsione, offrendo così risultati critici per la realizzazione di iniziative di gestione energetica affidabili e supportate dai dati.

(2026). Methods for Analyzing Electricity Consumption Data and their Implementation in Multi-Platform Architectures . Retrieved from https://hdl.handle.net/10446/325828 Retrieved from http://dx.doi.org/10.13122/hussain-ayaz_phd2026-04-23