Promoting Explainability in Data-Driven Models for Anomaly Detection: A Step Toward Diagnosis
##plugins.themes.bootstrap3.article.main##
##plugins.themes.bootstrap3.article.sidebar##
Abstract
Anomaly detection has become a critical task in industry. Data-driven models are often used for anomaly detection due to their ability to learn patterns from data and identify behaviors that deviate from the learned patterns. Furthermore, they are simple to implement as they do not rely on complex physical models to make predictions. However, one major limitation of these models is their lack of explainability, which hinders the diagnosis of detected anomalies.
Explainability provides transparency and interpretability, allowing stakeholders to understand the reasons for the detected deviation. In the absence of explainability, it is challenging to determine why a particular instance was classified as abnormal. Without an understanding of the underlying reason for the anomaly, it becomes difficult to prescribe a reliable diagnostic. This can result in missed opportunities for preventing or mitigating damage caused by the anomaly. Explainability can also help in detecting false positives and false negatives, especially, to distinguish between abnormal behaviors and sensor failures.
Hydro-Quebec is the principal actor in electricity management in Quebec, Canada. The overwhelming majority of the production comes from hydroelectric generating units. Power grid sustainability then strongly depends on the efficient health supervision of these assets. In this study, we introduce a data-driven semi-supervised algorithm for anomaly detection, with emphasis on statistical explainability. This feature needs to be distinguished from the traditional explainable models, that build upon physics to interpret observations. Here, the purpose is to track the sources of deviations through statistics. This model does not belong to diagnosis tools, because its sole output is not sufficient to find the root causes of a problem. However, it makes a bridge toward such tools by providing clues about origin of failures.
The algorithm performs in two-stages. First a model is trained to learn the normal behavior of the generating unit for a given set of operating conditions. This part involves clustering for data reduction and kriging for regression. Second, it compares the multidimensional prediction with the actual realization. It quantifies the deviation of the asset to its expected behavior and provides an explainable indicator for anomaly detection.
After introducing the background foundations of the method, some examples are given that demonstrate the advantage of interpretability for support to operation and diagnosis. It will be shown how such an algorithm can be deployed in an operational environment and how it should be combined with other tools to improve assets health management.
How to Cite
##plugins.themes.bootstrap3.article.details##
data-driven models, anomaly detection, semi-supervised learning, hydroelectric units
Technical Report STAN-CS-79-773, Department of Computer Science, Stanford University.
Jaynes, E. (1978). Where do we stand on maximum entropy? In Proceedings of the maximum entropy formalism conference. (MIT, Boston, United States)
Journel, A., & C.J., H. (1979). Mining geostatistics. New York Academic Press.
Leonard, F. (2011). Dynamic clustering of transient signals. United States Patents. (US 2014/0100821)
Leonard, F. (2021). Quantitative analysis of signal related measurements for trending and pattern recognition. United States Patents. (US 10,902,088 B2)
Leonard, F., Merleau, J., Tapsoba, D., & Gagnon, M. (2019, ´ May). Hydro-turbine monitoring: from self-learned equipment behavior to a single global deviation indicator. In 22nd iris rotating machine conference. (NewOrleans, United States)
Smith, R. L. (2001). Environmental statistics. University North Carolina.
Sutharssan, T., Stoyanov, S., Bailey, C., & Yin, C. (2015). Prognostic and health management for engineering systems: a review of the data-driven approach and algorithms. Journal of Engineering, 2015(7), 215-222.
Tsui, K., Chen, N., Zhou, Q., Hai, Y., & Wang, W. (2015). Prognostics and health management: a review on data
driven approaches. Mathematical Problems in Engineering, 2015.
Welford, B. P. (1962). Note on a method for calculating corrected sums of squares and products. Technometrics, 4, 419-420.
Zubaroglu, A., & Atalay, V. (2021). Data stream clustering: a review. Artificial Intelligence Review, 54, 1201-1236.
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
The Prognostic and Health Management Society advocates open-access to scientific data and uses a Creative Commons license for publishing and distributing any papers. A Creative Commons license does not relinquish the author’s copyright; rather it allows them to share some of their rights with any member of the public under certain conditions whilst enjoying full legal protection. By submitting an article to the International Conference of the Prognostics and Health Management Society, the authors agree to be bound by the associated terms and conditions including the following:
As the author, you retain the copyright to your Work. By submitting your Work, you are granting anybody the right to copy, distribute and transmit your Work and to adapt your Work with proper attribution under the terms of the Creative Commons Attribution 3.0 United States license. You assign rights to the Prognostics and Health Management Society to publish and disseminate your Work through electronic and print media if it is accepted for publication. A license note citing the Creative Commons Attribution 3.0 United States License as shown below needs to be placed in the footnote on the first page of the article.
First Author et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.