Time-To-Event (TTE) modeling using survival analysis in industrial settings faces the challenge of premature replacements of machine components, which leads to bias and errors in survival prediction. Typically, TTE survival data contains information about components and if they had failed or not up to a certain time. For failed components, the time is noted, and a failure is referred to as an event. A component that has not failed is denoted as censored. In industrial settings, in contrast to medical settings, there can be considerable uncertainty in an event; a component can be replaced before it fails to prevent operation stops or because maintenance staff believe that the component is faulty. This shows up as “no fault found” in warranty studies, where a significant proportion of replaced components may appear fault-free when tested or inspected after replacement.
In this work, we propose an expectation-maximization-like method for discovering such premature replacements in survival data. The method is a two-phase iterative algorithm employing a genetic algorithm in the maximization phase to learn better event assignments on a validation set. The learned labels through iterations are accumulated and averaged to be used to initialize the following expectation phase. The assumption is that the more often the event is selected, the more likely it is to be an actual failure and not a “no fault found”.
Experiments on synthesized and simulated data show that the proposed method can correctly detect a significant percentage of premature replacement cases.
Survival Analysis, Predictive Maintenance, Early Replacements, Genetic Algorithms
Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society. Series B (Methodological), 34(2), 187–220.
Ghoggali, N., & Melgani, F. (2009). Automatic ground-truth validation with genetic algorithms for multispectral image classification. IEEE transactions on geoscience and remote sensing, 47(7), 2172–2181.
Guan, D., & Yuan, W. (2013). A survey of mislabeled training data detection techniques for pattern classification. IETE Technical Review, 30(6), 524–530.
Harrell, J., Frank E., Califf, R. M., Pryor, D. B., Lee, K. L., & Rosati, R. A. (1982, 05). Evaluating the Yield of Medical Tests. JAMA, 247(18), 2543-2546. doi: 10.1001/jama.1982.03320430047030
Ishwaran, H., Kogalur, U. B., Blackstone, E. H., & Lauer, M. S. (2008, 09). Random survival forests. Ann. Appl. Stat., 2(3), 841–860. doi: 10.1214/08-AOAS169
Katzman, J. L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., & Kluger, Y. (2018). Deepsurv: personalized treatment recommender system using a cox proportional hazards deep neural network. BMC medical research methodology, 18(1), 24. doi: 10.1186/s12874-018- 0482-1
Khan, S., Phillips, P., Hockley, C., & Jennions, I. K. (2012). Towards standardisation of no fault found taxonomy.
Pasolli, E., & Melgani, F. (2015). Genetic algorithm-based method for mitigating label noise issue in ecg signal classification. Biomedical Signal Processing and Control, 19, 130–136.
Polsterl, S. (2020). scikit-survival: A library for time-to-event analysis built on top of scikit-learn. Journal of Machine Learning Research, 21(212), 1-6.
Sanchez, J. S., Barandela, R., Marqu ´ es, A. I., Alejo, R., & ´ Badenas, J. (2003). Analysis of new techniques to obtain quality training sets. Pattern Recognition Letters, 24(7), 1015–1022.
Saxena, A., Goebel, K., Simon, D., & Eklund, N. (2008). Damage propagation modeling for aircraft engine runto-failure simulation. In 2008 international conference on prognostics and health management (p. 1-9). doi: 10.1109/PHM.2008.4711414
Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics(3), 408–421.
Zeng, X., & Martinez, T. (2003). A noise filtering method using neural networks. In Ieee international workshop on soft computing techniques in instrumentation, measurement and related applications, 2003. scima 2003. (pp. 26–31).
Zhu, X., Wu, X., & Chen, Q. (2003). Eliminating class noise in large datasets. In Proceedings of the 20th international conference on machine learning (icml-03) (pp. 920–927).
This work is licensed under a Creative Commons Attribution 3.0 Unported License.