Promoting Explainability in Data-Driven Models for Anomaly Detection: A Step Toward Diagnosis



Published Oct 26, 2023
Quentin Dollon Paul Labbé François Léonard


Anomaly detection has become a critical task in industry. Data-driven models are often used for anomaly detection due to their ability to learn patterns from data and identify behaviors that deviate from the learned patterns. Furthermore, they are simple to implement as they do not rely on complex physical models to make predictions. However, one major limitation of these models is their lack of explainability, which hinders the diagnosis of detected anomalies.

Explainability provides transparency and interpretability, allowing stakeholders to understand the reasons for the detected deviation. In the absence of explainability, it is challenging to determine why a particular instance was classified as abnormal. Without an understanding of the underlying reason for the anomaly, it becomes difficult to prescribe a reliable diagnostic. This can result in missed opportunities for preventing or mitigating damage caused by the anomaly. Explainability can also help in detecting false positives and false negatives, especially, to distinguish between abnormal behaviors and sensor failures.

Hydro-Quebec is the principal actor in electricity management in Quebec, Canada. The overwhelming majority of the production comes from hydroelectric generating units. Power grid sustainability then strongly depends on the efficient health supervision of these assets. In this study, we introduce a data-driven semi-supervised algorithm for anomaly detection, with emphasis on statistical explainability. This feature needs to be distinguished from the traditional explainable models, that build upon physics to interpret observations. Here, the purpose is to track the sources of deviations through statistics. This model does not belong to diagnosis tools, because its sole output is not sufficient to find the root causes of a problem. However, it makes a bridge toward such tools by providing clues about origin of failures.

The algorithm performs in two-stages. First a model is trained to learn the normal behavior of the generating unit for a given set of operating conditions. This part involves clustering for data reduction and kriging for regression. Second, it compares the multidimensional prediction with the actual realization. It quantifies the deviation of the asset to its expected behavior and provides an explainable indicator for anomaly detection.

After introducing the background foundations of the method, some examples are given that demonstrate the advantage of interpretability for support to operation and diagnosis. It will be shown how such an algorithm can be deployed in an operational environment and how it should be combined with other tools to improve assets health management.

How to Cite

Dollon, Q., Labbé, P., & Léonard, F. (2023). Promoting Explainability in Data-Driven Models for Anomaly Detection: A Step Toward Diagnosis. Annual Conference of the PHM Society, 15(1).
Abstract 125 | PDF Downloads 118



data-driven models, anomaly detection, semi-supervised learning, hydroelectric units

Chen, T., Golub, G., & Leveque, R. (1979). Updating formulae and a pairwise algorithm for computing sample variances.

Technical Report STAN-CS-79-773, Department of Computer Science, Stanford University.

Jaynes, E. (1978). Where do we stand on maximum entropy? In Proceedings of the maximum entropy formalism conference. (MIT, Boston, United States)

Journel, A., & C.J., H. (1979). Mining geostatistics. New York Academic Press.

Leonard, F. (2011). Dynamic clustering of transient signals. United States Patents. (US 2014/0100821)

Leonard, F. (2021). Quantitative analysis of signal related measurements for trending and pattern recognition. United States Patents. (US 10,902,088 B2)

Leonard, F., Merleau, J., Tapsoba, D., & Gagnon, M. (2019, ´ May). Hydro-turbine monitoring: from self-learned equipment behavior to a single global deviation indicator. In 22nd iris rotating machine conference. (NewOrleans, United States)

Smith, R. L. (2001). Environmental statistics. University North Carolina.

Sutharssan, T., Stoyanov, S., Bailey, C., & Yin, C. (2015). Prognostic and health management for engineering systems: a review of the data-driven approach and algorithms. Journal of Engineering, 2015(7), 215-222.

Tsui, K., Chen, N., Zhou, Q., Hai, Y., & Wang, W. (2015). Prognostics and health management: a review on data
driven approaches. Mathematical Problems in Engineering, 2015.

Welford, B. P. (1962). Note on a method for calculating corrected sums of squares and products. Technometrics, 4, 419-420.

Zubaroglu, A., & Atalay, V. (2021). Data stream clustering: a review. Artificial Intelligence Review, 54, 1201-1236.
Technical Research Papers