AI for Sustainable Building Operations: Data-Driven Anomaly Detection in Ventilation Systems
##plugins.themes.bootstrap3.article.main##
##plugins.themes.bootstrap3.article.sidebar##
Abstract
Detecting deviations in building time series data is essential for robust heating, ventilation, and air conditioning (HVAC) operation and energy-efficient facility management. In practice, however, building management system (BMS) data are often incomplete, heterogeneous, and lack reliable fault labels.
This paper presents a benchmarking and feasibility study of data-driven anomaly detection on multivariate air-handling unit (AHU) time series data under realistic deployment constraints. We construct a unified dataset and define a domain-informed rule-based baseline as an interpretable operational reference and source of weak labels. We further evaluate classical unsupervised methods and representation-learning approaches using Temporal Convolutional Network (TCN) and Time Series Mixer (TSMixer) autoencoders, considering both a joint multivariate representation of all selected sensors and subsystem-based representations in which sensors are grouped by AHU function. Additionally, SHapley Additive exPlanations-based (SHAP) attribution is used to improve interpretability by identifying the sensor-level contributions to detected deviations.
The results show that rule-based methods capture explicitly defined conditions, while data-driven approaches identify additional statistically unusual and temporally structured deviations, with representation-learning models flagging 1.1–1.4% of windows in the global setting and up to 4.7% in subsystem-based analyses. High-consensus events (~0.8%) occur during temporally localized episodes with agreement across multiple models, indicating robust, structured deviations. These detections represent candidate anomalies that require further validation.
Our results show that combining rule-based, classical, and representation-learning methods provides complementary insights into AHU behavior and helps screen for relevant deviations in performance and energy use.
How to Cite
##plugins.themes.bootstrap3.article.details##
HVAC anomaly detection, multivariate time series, unsupervised learning, representation learning, energy-efficient building operation
Bellanco, I., Fuentes, E., Vallès, M., & Salom, J. (2021). A review of the fault behavior of heat pumps and measurements, detection and diagnosis methods including virtual sensors. Journal of Building Engineering, 39, 102254. doi: https://doi.org/10.1016/j.jobe.2021.102254
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. doi: 10.1023/A:1010933404324
Chen, S.-A., Li, C.-L., Arik, S. O., Yoon, J., & Pfister, T. (2023). TSMixer: An all-MLP architecture for time series forecasting. arXiv preprint arXiv:2303.06053.
Chen, Z., O’Neill, Z., Wen, J., Pradhan, O., Yang, T., Lu, X., ... Herr, T. (2023). A review of data-driven fault detection and diagnostics for building HVAC systems. Applied Energy, 339, 121030. doi: 10.1016/j.apenergy.2023.121030
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–38.
El Mokhtari, K., & McArthur, J. (2024). Autoencoder-based fault detection using building automation system data. Advanced Engineering Informatics, 62, 102810. doi: 10.1016/j.aei.2024.102810
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96) (pp. 226–231). AAAI Press.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. doi: 10.1162/neco.1997.9.8.1735
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24(6), 417–441. doi: 10.1037/h0071325
Katipamula, S., & Brambley, M. R. (2005a). Methods for fault detection, diagnostics, and prognostics for building systems: A review, Part II. HVAC&R Research, 11(2), 169–187. doi: 10.1080/10789669.2005.10391133
Katipamula, S., & Brambley, M. R. (2005b). Methods for fault detection, diagnostics, and prognostics for building systems: A review, Part I. HVAC&R Research, 11(1), 3–25. doi: 10.1080/10789669.2005.10391123
Kingma, D. P., & Welling, M. (2013). Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114.
Liao, H., Cai, W., Cheng, F., Dubey, S., & Rajesh, P. B. (2021). An online data-driven fault diagnosis method for air handling units by rule and convolutional neural networks. Sensors, 21(13), 4358. doi: 10.3390/s21134358
Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining (pp. 413–422). IEEE. doi: 10.1109/ICDM.2008.17
Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (Vol. 30).
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281–297). University of California Press.
Matetić, I., Štajduhar, I., Wolf, I., & Ljubić, S. (2023). A review of data-driven approaches and techniques for fault detection and diagnosis in HVAC systems. Sensors, 23(1), 1. doi: 10.3390/s23010001
Mirnaghi, M. S., & Haghighat, F. (2020). Fault detection and diagnosis of large-scale HVAC systems in buildings using data-driven methods: A comprehensive review. Energy and Buildings, 229, 110492. doi: 10.1016/j.enbuild.2020.110492
Ranade, A., Provan, G., El-Din Mady, A., & O’Sullivan, D. (2020). A computationally efficient method for fault diagnosis of fan-coil unit terminals in building heating, ventilation and air conditioning systems. Journal of Building Engineering, 27, 100955. doi: https://doi.org/10.1016/j.jobe.2019.100955
Ratner, A., Bach, S. H., Ehrenberg, H., Fries, J., Wu, S., & Ré, C. (2017). Snorkel: Rapid training data creation with weak supervision. Proceedings of the VLDB Endowment, 11(3), 269–282. doi: 10.14778/3157794.3157797
Saeed, W., & Omlin, C. (2023). Explainable AI (XAI): A systematic meta-survey of current challenges and future opportunities. Knowledge-Based Systems, 263, 110273. doi: 10.1016/j.knosys.2023.110273
Sipetic, M., Schöny, M., & Catal, J. (2024). Application of autoencoders on multivariate anomaly detection in building automation systems with variable selection based on semantic metadata of the facility. In Proceedings of the 7th International Conference on Efficiency, Cost, Optimization, Simulation and Environmental Impact of Energy Systems (ECOS 2024).
Troncoso-García, A. R., Martínez-Ballesteros, M., Martínez-Álvarez, F., & Troncoso, A. (2023). A new approach based on association rules to add explainability to time series forecasting models. Information Fusion, 94, 169–180. doi: 10.1016/j.inffus.2023.01.021
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... Polosukhin, I. (2017). Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 6000–6010). Curran Associates.
Youssef, M. E., Guarino, F., Sibilio, S., & Rosato, A. (2023). Experimental assessment of a preliminary rule-based data-driven method for fault detection and diagnosis of coils, fans and sensors in air-handling units. In Sustainability in Energy and Buildings 2022 (Vol. 336, pp. 359–370). Singapore: Springer. doi: 10.1007/978-981-19-8769-4_34
Zamanzadeh Darban, Z., Webb, G. I., Pan, S., Aggarwal, C., & Salehi, M. (2024). Deep learning for time series anomaly detection: A survey. ACM Computing Surveys, 57(1). doi: 10.1145/3691338
Zhang, F., Saeed, N., & Sadeghian, P. (2023). Deep learning in fault detection and diagnosis of building HVAC systems: A systematic review with meta-analysis. Energy and AI, 12, 100235. doi: https://doi.org/10.1016/j.egyai.2023.100235

This work is licensed under a Creative Commons Attribution 3.0 Unported License.
The Prognostic and Health Management Society advocates open-access to scientific data and uses a Creative Commons license for publishing and distributing any papers. A Creative Commons license does not relinquish the author’s copyright; rather it allows them to share some of their rights with any member of the public under certain conditions whilst enjoying full legal protection. By submitting an article to the International Conference of the Prognostics and Health Management Society, the authors agree to be bound by the associated terms and conditions including the following:
As the author, you retain the copyright to your Work. By submitting your Work, you are granting anybody the right to copy, distribute and transmit your Work and to adapt your Work with proper attribution under the terms of the Creative Commons Attribution 3.0 United States license. You assign rights to the Prognostics and Health Management Society to publish and disseminate your Work through electronic and print media if it is accepted for publication. A license note citing the Creative Commons Attribution 3.0 United States License as shown below needs to be placed in the footnote on the first page of the article.
First Author et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.