Many industrial sectors have been collecting big sensor data. With recent technologies for processing big data, companies can exploit this for automatic failure detection and prevention. We propose the first completely automated method for failure analysis, machine-learning fault trees from raw observational data with continuous variables. Our method scales well and is tested on a real-world, five-year dataset of domestic heater operations in The Netherlands, with 31 million unique heater-day readings, each containing 27 sensor and 11 failure variables. Our method builds on two previous procedures: the C4.5 decision-tree learning algorithm, and the LIFT fault tree learning algorithm from Boolean data. C4.5 pre-processes each continuous variable: it learns an optimal numerical threshold which distinguishes between faulty and normal operation of the top-level system. These thresholds discretise the variables, thus allowing LIFT to learn fault trees which model the root failure mechanisms of the system and are explainable. We obtain fault trees for the 11 failure variables, and evaluate them in two ways: quantitatively, with a significance score, and qualitatively, with domain specialists. Some of the fault trees learnt have almost maximum significance (above 0.95), while others have medium-to-low significance (around 0.30), reflecting the difficulty of learning from big, noisy, real-world sensor data. The domain specialists confirm that the fault trees model meaningful relationships among the variables.
Big data, Fault Trees, Machine learning, Continuous data, Industrial case study, RAMS
Banner. (2021). Sure Cross QM30VT2 vibration and temperature sensor. (Datasheet)
Berk, J. (2009). System failure analysis. ASM International.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Routledge.
Chen, S., Ho, T., & Mao, B. (2007). Reliability evaluations of railway power supplies by fault-tree analysis. IET Electric Power Applications, 1(2), 161–172. doi: 10.1049/iet-epa:20060244
Cramér, H. (1946). Mathematical methods of statistics (PMS-9). Princeton University Press. doi: 10.1515/9781400883868
Evans, D. (2011). The internet of things: How the next evolution of the internet is changing everything (Tech. Rep.). CISCO; San Jose, CA, U.S.: Cisco Internet Business Solutions Group (IBSG).
Griffor, E. (2016). Handbook of system safety and security. doi: 10.1016/C2014-0-05033-2
Intergas. (2018). Combi Compact HRE. Installation, service and user instructions. (Installation manual 88287806)
Lazarova-Molnar, S., Niloofar, P., & Barta, G. K. (2020). Data-driven fault tree modeling for reliability assessment of cyber-physical systems. In WSC (pp. 2719–2730). IEEE. doi: 10.1109/WSC48552.2020.9383882
Lee, C., Alena, R., & Robinson, P. (2005). Migrating fault trees to decision trees for real time fault detection on international space station. In 2005 IEEE aerospace conference (pp. 1–6). doi: 10.1109/AERO.2005.1559584
Linard, A., Bucur, D., & Stoelinga, M. (2019). Fault trees from data: Efficient learning with an evolutionary algorithm. In SETTA (Vol. 11951, pp. 19–37). Springer. doi: 10.1007/978-3-030-35540-1_2
Linard, A., Bueno, M. L., Bucur, D., & Stoelinga, M. (2019). Induction of fault trees through bayesian networks. In ESREL. doi: 10.3850/978-981-11-2724-3_0596-cd
Nagaraju, V., Fiondella, L., & Wandji, T. (2017). A survey of fault and attack tree modeling and analysis for cyber risk management. In HST (pp. 1–6). IEEE. doi: 10.1109/THS.2017.7943455
Nauta, M., Bucur, D., & Stoelinga, M. (2018). LIFT: learning fault trees from observational data. In QEST (Vol. 11024, pp. 306–322). Springer. doi: 10.1007/978-3-319-99154-2_19
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.
Ruijters, E., Budde, C. E., Nakhaee, M. C., Stoelinga, M., Bucur, D., Hiemstra, D., & Schivo, S. (2019). FFORT: a benchmark suite for fault tree analysis. In ESREL. doi: 10.3850/978-981-11-2724-3_0641-cd
Ruijters, E., & Stoelinga, M. (2015). Fault Tree Analysis: A survey of the state-of-the-art in modeling, analysis and tools. Computer Science Review, 15–16, 29–62. doi: 10.1016/j.cosrev.2015.03.001
Ton, B., Basten, R., Bolte, J., Braaksma, J., Di Bucchianico, A., van de Calseyde, P., . . . Stoelinga, M. (2020). PrimaVera: Synergising predictive maintenance. Applied Sciences, 10(23). doi: 10.3390/app10238348
Vesely, W., Stamatelatos, M., Dugan, J., Fragola, J., Minarick, J., & Railsback, J. (2002). Fault tree handbook with aerospace applications. NASA Office of Safety and Mission Assurance. (version 1.1)
Yule, G. U. (1912). On the methods of measuring association between two attributes. Journal of the Royal Statistical Society, 75(6), 579–652