Towards Learning Causal Representations of Technical Word Embeddings for Smart Troubleshooting



Published Jul 18, 2022
Alexandre Trilla Nenad Mijatovic Xavier Vilasis-Cardona


This work explores how the causality inference paradigm may be applied to troubleshoot the root causes of failures through language processing and Deep Learning. To do so, the causality hierarchy has been taken for reference: associative, interventional, and retrospective levels of causality have thus been researched within textual data in the form of a failure analysis ontology and a set of written records on Return On Experience. A novel approach to extracting linguistic knowledge has been devised through the joint embedding of two contextualized Bag-Of-Words models, which defines both a probabilistic framework and a distributed representation of the underlying causal semantics. This method has been applied to the maintenance of rolling stock bogies, and the results indicate that the inference of causality has been partially attained with the currently available technical documentation (consensus over 70%). However, there is still some disagreement between root causes and problems that leads to confusion and uncertainty. In consequence, the proposed approach may be used as a strategy to detect lexical imprecision, make writing recommendations in the form of standard reporting guidelines, and ultimately help produce clearer diagnosis materials to increase the safety of the railway service.

Abstract 117 | PDF Downloads 95



technical, language, processing, causality, representation, embedding, word, troubleshooting

Ahmed, O., Tr¨auble, F., Goyal, A., Neitz, A., Bengio, J., Schölkopf, B., Wüthrich, M., and Bauer, S. (2020). CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning. arXiv:2010.04296 [cs.RO], 1–18.
Alaa, A. M., Weisz, M., and van der Schaar, M. (2017). Deep Counterfactual Networks with Propensity-Dropout. Proc. of the 34th International Conference on Machine Learning, 1–6.
Almeida, F., and Xex´eo, G. (2019). Word Embeddings: A Survey. arXiv:1901.09069 [cs.CL], 1–10.
Ansaldi, S. M., Agnello, P., Pirone, A., and Vallerotonda, M. R. (2021). Near Miss Archive: A Challenge to Share Knowledge among Inspectors and Improve Seveso Inspections. Sustainability, 13(8456), 1–21.
Ansari, F. (2020). Cost-based text understanding to improve maintenance knowledge intelligence in manufacturing enterprises. Computers and Industrial Engineering, 141(106319).
Arjovsky, M., Bottou, L., Gulrajani, I., and Lopez-Paz, D. (2019). Invariant Risk Minimization. arXiv:1907.02893 [stat.ML], 1–31.
Atamuradov, V., Medjaher, K., Dersin, P., Lamoureux, B., and Zerhouni, N. (2017). Prognostics and Health Management for Maintenance Practitioners - Review, Implementation and Tools Evaluation. International Journal of Prognostics and Health Management, 8(60), 1–31.
Bahadori, M. T., and Heckerman, D. E. (2021). Debiasing Concept Bottleneck Models with a Causal Analysis Technique. Proc. of the International Conference on Learning Representations, 1–11.
Bahadori, M. T., Chalupka, K., Choi, E., Chen, R., Stewart, W. F., and Sun, J. (2017). Causal Regularization. arXiv:1702.02604 [cs.LG], 1–18.
Bahdanau, D., Cho, K., and Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. Proc. of the International Conference on Learning Representations, 1–15.
Bakarov, A. (2018). A Survey of Word Embeddings Evaluation Methods. arXiv:1801.09536 [cs.CL], 1–26.
Barocas, S., Selbst, A. D., and Raghavan, M. (2019). The Hidden Assumptions Behind Counterfactual Explanations and Principal Reasons. arXiv:1912.04930 [cs.CY], 1-17.
Bayer, M., Kaufhold, M.-A., Buchhold, B., Keller, M., Dallmeyer, J., and Reuter, C. (2021). Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers. arXiv:2103.14453 [cs.CL], 1–20.
Bengio, Y. (2017). The Consciousness Prior. arXiv:1709.08568 [cs.LG], 1–7.
Bronstein, M. M., Bruna, J., Cohen, T., Velickovic, P. (2021). Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv:2104.13478 [cs.LG], 1–156.
Brundage, M. P., Sexton, T., Hodkiewicz, M., Dima, A., and Lukens, S. (2021). Technical language processing: Unlocking maintenance knowledge. Manufacturing Letters, 27, 42–46.
Brundage, M. P., Weiss, B. A., and Pellegrino, J. (2020). Summary Report: Standards Requirements Gathering Workshop for Natural Language Analysis. National Institute of Standards and Technology Advanced Manufacturing Series, 100(30), 1–50.
Cai, H., Zheng, V. H., and Chang, K. C.-C. (2018). A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications. IEEE Transactions on Knowledge and Data Engineering, 30, 1616–1637.
Camacho-Collados, J., and Pilehvar, M. T. (2018). From Word to Sense Embeddings: A Survey on Vector Representations of Meaning. Journal of Artificial Intelligence Research, 63, 743–788.
Chen, W., Grangier, D., and Auli, M. (2015). Strategies for Training Large Vocabulary Neural Language Models. Proc. of the 54th Annual Meeting of the Association for Computational Linguistics, 1975–1985.
Cho, K., van Merri¨enboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. Proc. of the Conference on Empirical Methods in Natural Language Processing, 1–14.
Conrad, S. (2019). Register in English for Academic Purposes and English for Specific Purposes. Register Studies, 1(1), 168-198.
Crawshaw, M. (2020). Multi-Task Learning with Deep Neural Networks: A Survey. arXiv:2009.09796 [cs.LG], 1–43.
Dima, A., Lukens, S., Hodkiewicz, M., Sexton, T., and Brundage, M. P. (2021). Adapting natural language processing for technical text. Applied AI Letters, 2(e33), 1–11.
Dinmohammadi, F., Alkali, B., Shafiee, M., Bérenguer, C., and Labib, A. (2016). Risk Evaluation of Railway Rolling Stock Failures Using FMECA Technique: A Case Study of Passenger Door System. Urban Rail Transit, 2(3–4), 128–145.
DuBay, W. H. (2004). The Principles of Readability. Impact Information, 1–77.
Ebrahimipour, V., Rezaie, K., and Shokravi, S. (2010). An ontology approach to support FMEA studies. Expert Systems with Applications, 37(1), 671–677.
Ezen-Can, A. (2020). A Comparison of LSTM and BERT for Small Corpus. arXiv:2009.05451 [cs.CL], 1–12.
Fink, O., Wang, Q., Svens´en, M., Dersin, P., Lee, W.-J., and Ducoffe, M. (2020). Potential, challenges and future directions for deep learning in prognostics and health management applications. Engineering Applications of Artificial Intelligence, 92(103678), 1–15.
Gelman, A., and Imbens, G. (2013). Why ask why? Forward causal inference and reverse causal questions (Tech. Rep. No. 19614). National Bureau of Economic Research.
Gelman, A., and Vehtari, A. (2020). What are the most important statistical ideas of the past 50 years? arXiv:2012.00174 [stat.ME], 1–19.
Goth, G. (2016). Deep or Shallow, NLP Is Breaking Out. Communications of the ACM, 59(3), 13–16.
Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., and Schmidhuber, J. (2017). LSTM: A Search Space Odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10), 2222–2232.
Guo, R., Cheng, L., Li, J., Hahn, P. R., and Liu, H. (2020). A Survey of Learning Causality with Data: Problems and Methods. ACM Computing Surveys, 53(4).
Gutmann, M., and Hyv¨arinen, A. (2010). Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. Proc. of the 13th International Conference on Artificial Intelligence and Statistics, 297–304.
Hancock, J. T., and Khoshgoftaar, T. M. (2020). Survey on categorical data for neural networks. Journal of Big Data, 7(28), 1–41.
Hartford, J., Lewis, G., Leyton-Brown, K., and Taddy, M. (2017). Deep IV: A Flexible Approach for Counterfactual Prediction. Proc. of the 34th International Conference on Machine Learning, 1–10.
Hastings, E. M., Sexton, T., Brundage, M. P., and Hodkiewicz, M. (2019). Agreement Behavior of Isolated Annotators for Maintenance Work-Order Data Mining. Proc. of the Annual Conference of the Prognostics and Health Management Society, 1–7.
Imbens, G. W. (2020). Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics. arXiv:1907.07271 [stat.ME], 1–76.
ISO. (2003). Condition monitoring and diagnostics of machines – General guidelines on data interpretation and diagnostics techniques (Tech. Rep. No. 13379:2003(E)). International Organization for Standardization.
Iyyer, M., Boyd-Graber, J., Claudino, L., Socher, R., and Daumé III, H. (2014). A Neural Network for Factoid Question Answering over Paragraphs. Proc. of the Conference on Empirical Methods in Natural Language Processing, 633–644.
Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., and Wu, Y. (2016). Exploring the Limits of Language Modeling. arXiv:1602.02410 [cs.CL], 1–11.
Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A.,Wu, J., and Amodei, D. (2020). Scaling Laws for Neural Language Models. arXiv:2001.08361 [cs.LG], 1–30.
Karray, M. H., Ameri, F., Hodkiewicz, M., and Louge, T. (2019). ROMAIN: Towards a BFO compliant reference ontology for industrial maintenance. Applied Ontology, 14(2), 155–177.
Le, Q., and Mikolov, T. (2014). Distributed Representations of Sentences and Documents. Proc. of the 31st International Conference on Machine Learning, 1–9.
Leao, B. P., Fitzgibbon, K. T., Puttini, L. C., and de Melo, G. P. B. (2008). Cost-Benefit Analysis Methodology for PHM Applied to Legacy Commercial Aircraft. Proc. of IEEE Aerospace Conference, 1–13.
LeCun, Y. and Bengio, Y., and Hinton, G. E. (2015). Deep Learning. Nature, 521, 436-444.
Lee, K., Firat, O., Agarwal, A., Fannjiang, C., and Sussillo, D. (2018). Hallucinations in Neural Machine Translation. Proc. of the 32th Conference on Neural Information Processing Systems, 1–18.
Leuenberger, H., Puchkov, M., and Schneider, B. (2013). Right, First Time Concept and Workflow. A Paradigm Shift for a Smart & Lean Six-sigma Development. Swiss Pharma, 35(3), 3–16.
Levy, O., & Goldberg, Y. (2014). Dependency-based word embeddings. Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics, 2, 302–308.
Levy, O., and Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. Proc. of the 27th International Conference on Neural Information Processing Systems, 2, 2177–2185.
Levy, O., Goldberg, Y., and Dagan, I. (2015). Improving Distributional Similarity with Lessons Learned from Word Embeddings. Transactions of the Association for Computational Linguistics, 3, 211–225.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K¨uttler, H., Lewis, M., Yih, W.-t-, Rocktäschel, T., Riedel, S., and Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Proc. of the International Conference on Neural Information Processing Systems, 1–18.
Li, J., Tang, T., Zhao, W. X., and Wen, J.-R. (2021). Pretrained Language Models for Text Generation: A Survey. arXiv:2105.10311 [cs.CL], 1–9.
Li Y., and Yang T. (2018). Word Embedding for Understanding Natural Language: A Survey. Guide to Big Data Applications. Studies in Big Data, 26, 83–104.
Li, Y.-H., Wang, Y.-D., and Zhao, W.-Z. (2009). Bogie Failure Mode Analysis for Railway Freight Car Based on FMECA. Proc. of the 8th International Conference on Reliability, Maintainability and Safety, 5–8.
Liu, Q., Kusner, M. J., and Blunsom, P. (2020). A Survey on Contextual Embeddings. arXiv:2003.07278 [cs.CL], 1–13.
Liu, Z., Jia, Z., Vong, C.-M., Han, W., Yan, C., and Pecht, M. (2018). A Patent Analysis of Prognostics and Health Management (PHM) Innovations for Electrical Systems. IEEE Access, 6, 18088–18107.
Maguire, P., Mulhall, O., Maguire, R., and Taylor, J. (2015). Compressionism: A Theory of Mind Based on Data Compression. Proc. of the 11th International Conference on Cognitive Science, 294–299.
Mathew, S., Das, D., Rossenberger, R., and Pecht, M. (2008). Failure Mechanisms Based Prognostics. Proc. of the International Conference on Prognostics and Health Management, 1–6.
Mihalcea, R., and Radev, D. (2011). Graph-Based Natural Language Processing and Information Retrieval. Cambridge University Press.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. Proc. of Workshop at the International Conference on Learning Representations, 1–12.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality. Proc. of the Conference on Neural Information Processing Systems, 1–9.
Mikolov, T., Yih, W.-t., and Zweig, G. (2013). Linguistic Regularities in Continuous SpaceWord Representations. Proc. of the North American Chapter of the Association for Computational Linguistics, 746–751.
Mitrovic, J., McWilliams, B., Walker, J., Buesing, L., and Blundell, C. (2021). Representation Learning via Invariant Causal Mechanisms. Proc. of the International Conference on Learning Representations, 1–12.
Mnih, A., and Teh, Y. W. (2012). A Fast and Simple Algorithm for Training Neural Probabilistic Language Models. Proc. of the 29th International Conference on Machine Learning, 1–8.
Mothilal, R. K., Sharma, A., and Tan, C. (2020). Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations. Proc. of the Conference on Fairness, Accountability, and Transparency, 1–13.
Nastase, V., Mihalcea, R., and Radev, D. (2015). A survey of graphs in natural language processing. Natural Language Engineering, 21(5), 665–697.
Naveed, A., Li, J., Saha, B., Saxena, A., and Vachtsevanos, G. (2012). A Reasoning Architecture for Expert Troubleshooting of Complex Processes. Proc. of the Annual Conference of the Prognostics and Health Management Society, 1–8.
Navinchandran, M., Sharp, M. E., Brundage, M. P., and Sexton, T. B. (2019). Studies to Predict Maintenance Time Duration and Important Factors From Maintenance Workorder Data. Proc. of the Annual Conference of the Prognostics and Health Management Society, 1–11.
Pearl, J. (2009). Causal inference in statistics: An overview. Statistics Surveys, 3, 96–146.
Pearl, J. (2019). The seven tools of causal inference, with reflections on machine learning. Communications of the ACM, 62(3), 54–60.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv:1802.05365 [cs.CL], 1–15.
Petroni, F., Rockt¨aschel, T., Lewis, P., Bakhtin, A., Wu, Y., Miller, A. H., and Riedel, S. (2019). Language Models as Knowledge Bases? Proc. of the Conference on Empirical Methods in Natural Language Processing, 1–11.
Polenghi, A., Roda, I., Macchi, M., and Pozzetti, A. (2022). Ontology-augmented Prognostics and Health Management for shopfloor-synchronised joint maintenance and production management decisions. Journal of Industrial Information Integration, 27(100286).
PwC. (2019). It’s time for a consumer-centred metric: introducing ’return on experience’. Global Consumer Insights Survey (Tech. Rep. No. 512587-2019). PricewaterhouseCoopers International Limited.
Rehman, Z., and Kifor, C. V. (2016). An Ontology to Support Semantic Management of FMEA Knowledge. International Journal of Computers Communications & Control, 11(4), 507–521.
Salakhutdinov, R., and Hinton, G. (2009). Semantic hashing. International Journal of Approximate Reasoning, 50, 969–978.
Schölkopf, B., Locatello, F., Bauer, S., Ke, N. R., Kalchbrenner, N., Goyal, A., and Bengio, Y. (2021). Towards Causal Representation Learning. Proc. of the IEEE, 109(5), 612–634.
Sexton, T. B., and Brundage, M. P. (2019). Nestor: A Tool for Natural Language Annotation of Short Texts. Journal of Research of National Institute of Standards and Technology, 124(124029), 1–5.
Sexton, T., Hodkiewicz, M., Brundage, M. P., and Smoker, T. (2018). Benchmarking for Keyword Extraction Methodologies in Maintenance Work Orders. Proc. of the Annual Conference of the Prognostics and Health Management Society, 1–10.
Shalit, U., Johansson, F. D., and Sontag, D. (2016). Estimating individual treatment effect: generalization bounds and algorithms. arXiv:1606.03976 [stat.ML], 1–20.
Sharma, A., and Kiciman, E. (2020). DoWhy: An End-to-End Library for Causal Inference. arXiv:2011.04216 [stat.ME], 1–5.
Sharp, M. E., Sexton, T. B., and Brundage, M. P. (2017). Semi-Autonomous Labeling of Unstructured Maintenance Log Data for Diagnostic Root Cause Analysis. Proc. of the International Conference Advances in Production Management Systems, 1–8.
Shen, Z., Cui, P., Kuang, K., Li, B., and Chen, P. (2018). Causally Regularized Learning with Agnostic Data Selection Bias. Proc. of ACM Multimedia Conference, 1–9.
Smetkowska, M., and Mrugalska, B. (2018). Using Six Sigma DMAIC to Improve the Quality of the Production Process: A Case Study. Procedia – Social and Behavioral Sciences, 238, 590–596.
Stampe, D.W. (2008). Towards A Causal Theory of Linguistic Representation. Midwest Studies in Philosophy, 2(1), 42–63.
Su, Y., Awadallah, A. H., Wang, M., and White, R. H. (2018). Natural Language Interfaces with Fine-Grained User Interaction: A Case Study on Web APIs. Proc. of the 41th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1–10.
Suter, R., Miladinovi´c, D., Sch¨olkopf, B., and Bauer, S., (2019). Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness. Proc. of the 36th International Conference on Machine Learning, 1–10.
Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks. arXiv:1409.3215 [cs.CL], 1–9.
Tan, L., Zhang, H., Clarke, C. L. A., and Smucker, M. D. (2015). Lexical Comparison Between Wikipedia and Twitter Corpora by Using Word Embeddings. Proc. of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers), 657–661.
Tan, S., Zhou, Z., Xu, Z., and Li, P. (2019). On Efficient Retrieval of Top Similarity Vectors. Proc. of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 5236–5246.
Torfi, A., Shirvani, R. A., Keneshloo, Y., Tavaf, N., and Fox, E. A. (2020). Natural Language Processing Advancements By Deep Learning: A Survey. arXiv:2003.01200 [cs.CL], 1–21.
Trilla, A., Bob-Manuel, J., Lamoureux, B., and Vilasis-Cardona, X. (2021). Integrated Multiple-Defect Detection and Evaluation of Rail Wheel Tread Images using Convolutional Neural Networks. International Journal of Prognostics and Health Management, 12(1), 1–19.
Vlontzos, A., Kainz, B., and Gilligan-Lee, C. M. (2021). Estimating the probabilities of causation via deep monotonic twin networks. arXiv:2109.01904 [cs.LG], 1–10.
Wang, C., and Sennrich, R. (2020). On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation. Proc. of the 58th Annual Meeting of the Association for Computational Linguistics, 3544–3552.
Wang, Z., and Wang, H. (2016). Understanding Short Texts. Proc. of the 54th Annual Meeting of the Association for Computational Linguistics, 1–4.
Yaghoobzadeh, Y., Kann, K., Hazen, T. J., Agirre, E., and Schütze, H. (2019). Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings. Proc. of the 57th Annual Meeting of the Association for Computational Linguistics, 5740–5753.
Yang, M., Liu, F., Chen, Z., Shen, X., Hao, J., and Wang, J. (2020). CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models. arXiv:2004.08697 [cs.LG], 1–21.
Yao, L., Chu, Z., Li, S., Li, Y., Gao, J., and Zhang, A. (2020). A Survey on Causal Inference. arXiv:2002.02770 [stat.ME], 1–38.
Yu, W., Zhu, C., Li, Z., Hu, Z., Wang, Q., Ji, H., and Jiang, M. (2020). A Survey of Knowledge-Enhanced Text Generation. arXiv:2010.04389 [cs.CL], 1–44.
Zanette, D. H., and Montemurro, M. A. (2005). Dynamics of Text Generation with Realistic Zipf’s Distribution. Journal of Quantitative Linguistics, 12(1), 29–40.
Zheng, M., Marsh, J. K., Nickerson, J. V., and Kleinberg, S. (2020). How causal information affects decisions. Cognitive Research: Principles and Implications, 5(6), 1–24.
Technical Papers