Explainable Predictive Maintenance is Not Enough: Quantifying Trust in Remaining Useful Life Estimation



Published Oct 26, 2023
Ripan Kumar Kundu Khaza Anuarul Hoque


Machine learning (ML)/deep learning (DL) has shown tremendous success in data-driven predictive maintenance (PdM). However, operators and technicians often require insights to understand what is happening, why it is happening, and how to react, which these black-box models cannot provide. This is a major obstacle in adopting PdM as it cannot support experts in making maintenance decisions based on the problems it detects. Motivated by this, several researchers have recently utilized various post-hoc explanation methods and tools, such as LIME, SHAP, etc., for explaining the predicted RUL from these black-box models. Unfortunately, such (post-hoc) explanation methods often suffer from the \emph{disagreement problem}, which occurs when multiple explainable AI (XAI) tools differ in their feature ranking. Hence, explainable PdM models that rely on these methods are not trustworthy, as such unstable explanations may lead to catastrophic consequences in safety-critical PdM applications. This paper proposes a novel framework to address this problem. Specifically, first, we utilize three state-of-the-art explanation methods: LIME, SHAP, and Anchor, to explain the predicted RUL from three ML-based PdM models, namely extreme gradient boosting (XGB), random forest (RF), logistic regression (LR), and one feed-forward neural network (FFNN)-based PdM model using the C-MAPSS dataset. We show that the ranking of dominant features for RUL prediction differs for different explanation methods. Then, we propose a new metric \emph{trust score} for selecting the proper explanation method. This is achieved by evaluating the XAI methods using four evaluation metrics: fidelity, stability, consistency, and identity, and then combining them into a single \emph{trust score} metric through utilizing Kenny and Borda rank aggregation methods. Our results show that the proposed method effectively selects the most appropriate explanation method from a set of explanation methods for estimated RULs. To the best of our knowledge, this is the first work that attempts to address and solve the disagreement problem in explainable PdM.

How to Cite

Kundu, R. K., & Hoque, K. A. (2023). Explainable Predictive Maintenance is Not Enough: Quantifying Trust in Remaining Useful Life Estimation. Annual Conference of the PHM Society, 15(1). https://doi.org/10.36001/phmconf.2023.v15i1.3472
Abstract 553 | Slides (PDF) Downloads 158 Paper (PDF) Downloads 276



explainable AI, trustworthy, XAI, 'trustworthy', 'remaining useful life' , 'turbofan engine'

Alvarez-Melis, D., & Jaakkola, T. S. (2018). On the robustness of interpretability methods. arXiv preprint arXiv:1806.08049.
Arya, V., Saha, D., Hans, S., Rajasekharan, A., & Tang, T. (2023). Global explanations for multivariate time series models. In Proceedings of the 6th joint international conference on data science & management of data (10th acm ikdd cods and 28th comad) (pp. 149– 157).

Baptista, M., Mishra, M., Henriques, E., & Prendinger, H. (2020). Using explainable artificial intelligence to interpret remaininguseful life estimation with gated recurrent unit.

Baptista, M. L., Goebel, K., & Henriques, E. M. (2022). Relation between prognostics predictor evaluation metrics and local interpretability shap values. Artificial Intelligence, 306, 103667.

Baumeister, D., & Rothe, J. (2016). Preference aggregation by voting. Economics and computation: An introduction to algorithmic game theory, computational social choice, and fair division, 197–325.

Bobek, S., Bałaga, P., & Nalepa, G. J. (2021). Towards model-agnostic ensemble explanations. In Computational science–iccs 2021: 21st international conference, krakow, poland, june 16–18, 2021, proceedings, part iv (pp. 39–51).

Brodersen, K. H., Ong, C. S., Stephan, K. E., & Buhmann, J. M. (2010). The balanced accuracy and its posterior distribution. In 2010 20th international conference on pattern recognition (pp. 3121–3124).

Cachel, K., Rundensteiner, E., & Harrison, L. (2022). Manirank: Multiple attribute and intersectional group fairness for consensus ranking. In 2022 ieee 38th international conference on data engineering (icde) (pp. 1124–1137).

Carvalho, D. V., Pereira, E. M., & Cardoso, J. S. (2019). Machine learning interpretability: A survey on methods and metrics. Electronics, 8(8), 832.

Chen, D., Hong, W., & Zhou, X. (2022a). Transformer network for remaining useful life prediction of lithiumion batteries. IEEE Access, 10, 19621-19628. doi: 10.1109/ACCESS.2022.3151975

Chen, D., Hong, W., & Zhou, X. (2022b). Transformer network for remaining useful life prediction of lithium-ion batteries. Ieee Access, 10, 19621–19628.

Cohen, J., Huan, X., & Ni, J. (2023). Shapley-based explainable ai for clustering applications in fault diagnosis and prognosis. arXiv preprint arXiv:2303.14581.

Cummins, L., Killen, B., Thomas, K., Barrett, P., Rahimi, S., & Seale, M. (2021). Deep learning approaches to remaining useful life prediction: a survey. In 2021 ieee symposium series on computational intelligence (ssci) (pp. 1–9).

Dwork, C., Kumar, R., Naor, M., & Sivakumar, D. (2001). Rank aggregation revisited. Citeseer.

Elkhawaga, G., Elzeki, O., Abuelkheir, M., & Reichert, M. (2023). Evaluating explainable artificial intelligence methods based on feature elimination: A functionalitygrounded approach. Electronics, 12(7), 1670.

Ferraro, A., Galli, A., Moscato, V., & Sperl`ı, G. (2022). Evaluating explainable artificial intelligence tools for hard disk drive predictive maintenance. Artificial Intelligence Review, 1–36.

Hong, C. W., Lee, C., Lee, K., Ko, M.-S., & Hur, K. (2020). Explainable artificial intelligence for the remaining useful life prognosis of the turbofan engines. In 2020 3rd ieee international conference on knowledge innovation and invention (ickii) (pp. 144–147).

Hong, C. W., Lee, C., Lee, K., Ko, M.-S., Kim, D. E., & Hur, K. (2020). Remaining useful life prognosis for tur-bofan engine using explainable deep neural networks with dimensionality reduction. Sensors, 20(22), 6626.

Jafari, S., & Byun, Y.-C. (2022). Xgboost-based remaining useful life estimation model with extended kalman particle filter for lithium-ion batteries. Sensors, 22(23), 9522.

Jakubowski, J., Stanisz, P., Bobek, S., & Nalepa, G. J. (2022). Performance of explainable ai methods in asset failure prediction. In Computational science–iccs 2022: 22nd international conference, london, uk, june 21–23, 2022, proceedings, part iv (pp. 472–485).

Jayasinghe, L., Samarasinghe, T., Yuenv, C., Low, J. C. N., & Ge, S. S. (2019). Temporal convolutional memory networks for remaining useful life estimation of industrial machinery. In 2019 ieee international conference on industrial technology (icit) (pp. 915–920).

Jiao, Z., Wang, H., Xing, J., Yang, Q., Yang, M., Zhou, Y., & Zhao, J. (2023). A lightgbm based framework for lithium-ion battery remaining useful life prediction under driving conditions. IEEE Transactions on Industrial Informatics.

Keleko, A. T., Kamsu-Foguem, B., Ngouna, R. H., & Tongne, A. (2022). Artificial intelligence and real-time predictive maintenance in industry 4.0: a bibliometric analysis. AI and Ethics, 2(4), 553–577.

Khan, T., Ahmad, K., Khan, J., Khan, I., & Ahmad, N. (2022). An explainable regression framework for predicting remaining useful life of machines. In 2022 27th international conference on automation and computing (icac) (pp. 1–6).

Klementiev, A., Roth, D., & Small, K. (2008). Unsupervised rank aggregation with distance-based models. In Proceedings of the 25th international conference on machine learning (pp. 472–479).

Krishna, S., Han, T., Gu, A., Pombra, J., Jabbari, S., Wu, S., & Lakkaraju, H. (2022). The disagreement problem in explainable machine learning: A practitioner’s perspective. arXiv preprint arXiv:2202.01602.

Lestari, S., Adji, T. B., & Permanasari, A. E. (2018). Performance comparison of rank aggregation using borda and copeland in recommender system. In 2018 international workshop on big data and information security (iwbis) (pp. 69–74).

Lipu, M. H., Hannan, M., Hussain, A., Hoque, M., Ker, P. J., Saad, M. M., & Ayob, A. (2018). A review of state of health and remaining useful life estimation methods for lithium-ion battery in electric vehicles: Challenges and recommendations. Journal of cleaner production, 205, 115–133.

Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.

Nauta, M., Trienes, J., Pathak, S., Nguyen, E., Peters, M., Schmitt, Y., . . . Seifert, C. (2023). From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating explainable ai. ACM Computing Surveys, 55(13s), 1–42.

Ni, Q., Ji, J., & Feng, K. (2022). Data-driven prognostic scheme for bearings based on a novel health indicator and gated recurrent unit network. IEEE Transactions on Industrial Informatics, 19(2), 1301–1311.

Nori, H., Jenkins, S., Koch, P., & Caruana, R. (2019). Interpretml: A unified framework for machine learning interpretability. arXiv preprint arXiv:1909.09223.

Parimbelli, E., Buonocore, T. M., Nicora, G., Michalowski, W., Wilk, S., & Bellazzi, R. (2023). Why did ai get this one wrong?—tree-based explanations of machine learning model predictions. Artificial Intelligence in Medicine, 135, 102471.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., . . . others (2011). Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12, 2825–2830.

Rauf, H., Khalid, M., & Arshad, N. (2022). Machine learning in state of health and remaining useful life estimation: Theoretical and technological development in battery degradation modelling. Renewable and Sustainable Energy Reviews, 156, 111903.

Remadna, I., Terrissa, L. S., Al Masry, Z., & Zerhouni, N. (2022). Rul prediction using a fusion of attentionbased convolutional variational autoencoder and ensemble learning classifier. IEEE Transactions on Reliability.

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 1135–1144).

Ribeiro, M. T., Singh, S., & Guestrin, C. (2018). Anchors: High-precision model-agnostic explanations. In Proceedings of the aaai conference on artificial intelligence (Vol. 32).

Saxena, A., Goebel, K., Simon, D., & Eklund, N. (2008). Damage propagation modeling for aircraft engine run-to- failure simulation. In 2008 international conference on prognostics and health management (pp. 1–9).

Schmitt, E. J., & Jula, H. (2007). On the limitations of linear models in predicting travel times. In 2007 ieee intelligent transportation systems conference (pp. 830–835).

Sergeev, A., & Del Balso, M. (2018). Horovod: fast and easy distributed deep learning in tensorflow. arXiv preprint arXiv:1802.05799.

Serradilla, O., Zugasti, E., Rodriguez, J., & Zurutuza, U. (2022). Deep learning models for predictive maintenance: a survey, comparison, challenges and prospects. Applied Intelligence, 52(10), 10934–10964.

Sharma, P., & Bora, B. J. (2022). A review of modern machine learning techniques in the prediction of remain-ing useful life of lithium-ion batteries. Batteries, 9(1), 13.

Shekar, B., & Dagnew, G. (2019). Grid search-based hyperparameter tuning and classification of microarray cancer data. In 2019 second international conference on advanced computational and communication paradigms (icaccp) (pp. 1–8).

Tong, Z., Miao, J., Mao, J., Wang, Z., & Lu, Y. (2022). Prediction of li-ion battery capacity degradation considering polarization recovery with a hybrid ensemble learning model. Energy Storage Materials, 50, 533–542.

Torcianti, A., & Matzka, S. (2021). Explainable artificial intelligence for predictive maintenance applications using a local surrogate model. In 2021 4th international conference on artificial intelligence for industries (ai4i) (pp. 86–88).

Vollert, S., Atzmueller, M., & Theissler, A. (2021). Interpretable machine learning: A brief survey from the predictive maintenance perspective. In 2021 26th ieee international conference on emerging technologies and factory automation (etfa) (pp. 01–08).

Waad, B., Brahim, A. B., & Limam, M. (2013). Feature selection by rank aggregation and genetic algorithms. In Kdir/kmis (pp. 74–81).

Wen, Y., Rahman, M. F., Xu, H., & Tseng, T.-L. B. (2022). Recent advances and trends of predictive maintenance from data-driven machine prognostics perspective. Measurement, 187, 110276.

Wu, J., Zhang, C., & Chen, Z. (2016). An online method for lithium-ion battery remaining useful life estimation using importance sampling and neural networks. Applied energy, 173, 134–140.

Zhang, Z., Si, X., Hu, C., & Lei, Y. (2018). Degradation data analysis and remaining useful life estimation: A review on wiener-process-based methods. European Journal of Operational Research, 271(3), 775–796.

Zhou, J., Gandomi, A. H., Chen, F., & Holzinger, A. (2021). Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5), 593.

Zou, X., Hu, Y., Tian, Z., & Shen, K. (2019). Logistic regression model optimization and case analysis. In 2019 ieee 7th international conference on computer science and network technology (iccsnt) (pp. 135–139).
Technical Research Papers