Fmdtools A Fault Propagation Toolkit for Resilience Assessment in Early Design



Published Apr 7, 2021
Daniel Hulse Hannah Walsh Andy Dong Christopher Hoyle Irem Tumer Chetan Kulkarni Kai Goebel


Incorporating resilience in design is important for the long-term viability of complex engineered systems. Complex aerospace systems, for example, must ensure safety in the event of hazards resulting from part failures and external circumstances while maintaining efficient operations. Traditionally, mitigating hazards in early design has involved experts manually creating hazard analyses in a time-consuming process that hinders one’s ability to compare designs. Furthermore, as opposed to reliability-based design, resilience-based design requires using models to determine the dynamic effects of faults to compare recovery schemes. Models also provide design opportunities, since models can be parameterized and optimized and because the resulting hazard analyses can be updated iteratively. While many theoretical frameworks have been presented for early hazard assessment, most currently-available modelling tools are meant for the later stages of design. Given the wide adoption of Python in the broader research community, there is an opportunity to create an environment for researchers to study the resilience of different PHM technologies in the early phases of design. This paper describes fmdtools, an attempt to realize this opportunity with a set of modules which may be used to construct different design models, simulate system behaviors over a set of fault scenarios and analyze the resilience of the resulting simulation results. This approach is demonstrated in the hazard analysis and architecture design of a multi-rotor drone, showing how the toolkit enables a large number of analyses to be performed on a relatively simple model as it progresses through the early design process.

Abstract 910 | PDF Downloads 710



Fault Propagation Toolkit, Resilience Assessment, Design

Allenby, K., Kelly, T. (2001). Deriving safety requirements using scenarios. In Proceedings fifth ieee international symposium on requirements engineering (pp. 228–235).
ARP, S. (1996). 4761. Guidelines and methods for conducting the safety assessment process on civil airborne systems and equipment, 2.
Arribas, V., Nikova, S., Rijmen, V. (2018). Vermi: Verification tool for masked implementations. In ICECS (pp. 381–384). IEEE.
Banks, J., Reichard, K., Crow, E., Nickell, K. (2009). How engineers can conduct cost-benefit analysis for phm systems. IEEE Aerospace and Electronic Systems Magazine, 24(3), 22–30.
Barabási, A.-L. (2009). Scale-free networks: A decade and beyond. Science, 325(5939), 412–413. doi: 10.1126/science.1173299
Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., Hwang, D.-U. (2006). Complex networks: Structure and dynamics. Physics Reports, 424(4), 175 - 308. doi:
Bunus, P., Isaksson, O., Frey, B., M¨unker, B. (2009). Rodona model-based diagnosis approach for the dx diagnostic competition. Proc. DX’09, 423–430.
Chemweno, P., Pintelon, L., Muchiri, P. N., Van Horenbeek, A. (2018). Risk assessment methodologies in maintenance decision making: A review of dependability modelling approaches. Reliability Engineering & System Safety, 173, 64–77.
Chiacchio, F., Aizpurua, J. I., Compagno, L., D’Urso, D. (2020). Shyftoo, an object-oriented monte carlo simulation library for the modeling of stochastic hybrid fault tree automaton. Expert Systems with Applications, 146, 113139.
Chiacchio, F., Aizpurua, J. I., Compagno, L., Khodayee, S. M., D’Urso, D. (2019). Modelling and resolution of dynamic reliability problems by the coupling of simulink and the stochastic hybrid fault tree object oriented (shyftoo) library. Information, 10(9), 283.
Choi, H. J., Atkins, E., Yi, G. (2010). Flight envelope discovery for damage resilience with application to an f-16. In Aiaa infotech@ aerospace 2010 (p. 3353).
Combemale, B., Crégut, X., Giacometti, J.-P., Michel, P., Pantel, M. (2008). Introducing simulation and model animation in the mde topcased toolkit.
Cottam, B., Specking, E., Small, C., Pohl, E., Parnell, G. S., Buchanan, R. K. (2019). Defining resilience for engineered systems. Engineering Management Research, 8(2), 11–29.
Faturechi, R., Levenberg, E., Miller-Hooks, E. (2014). Evaluating and optimizing resilience of airport pavement networks. Computers & Operations Research, 43, 335–348.
Fraser, S., Simpson, A., Núñez, A., Deparday, V., Balog, S., Jongman, B., . . . others (2016). Thinkhazard!— delivering natural hazard information for decision making. In 2016 3rd international conference on information and communication technologies for disaster management (ict-dm) (pp. 1–6).
Gambi, A., M¨uller, M., Fraser, G. (2019). Asfault: Testing self-driving car software using search-based procedural content generation. In 2019 ieee/acm 41st international conference on software engineering: Companion proceedings (icse-companion) (pp. 27–30).
Georgakoudis, G., Laguna, I., Vandierendonck, H., Nikolopoulos, D. S., & Schulz, M. (2019). Safire: Scalable and accurate fault injection for parallel multithreaded applications. In 2019 ieee international parallel and distributed processing symposium (ipdps) (pp. 890–899).
Goldstein, B., Srinivasan, S., Mellempudi, N. K., Das, D., Santiago, L., Ferreira, V. C., . . . Franc¸a, F. M. G. (2020). Reliability evaluation of compressed deep learning models. In 2020 ieee 11th latin american symposium on circuits systems (lascas).
Grigoleit, F., Holei, S., Pleuß, A., Reiser, R., Rhein, J., Struss, P., Wedel, J. v. (2016). The qsafe project–developing a model-based tool for current practice in functional safety analysis.
Gundermann, J., Kolesnikov, A., Cameron, M., Blochwitz, T. (2019). The fault library-a new modelica library allows for the systematic simulation of non-nominal system behavior. In Proceedings of the 2nd japanese modelica conference, tokyo, japan, may 17-18, 2018 (pp. 161–168).
Hagberg, A., Swart, P., S Chult, D. (2008). Exploring network structure, dynamics, and function using networkx (Tech. Rep.). Los Alamos National Lab.(LANL), Los Alamos, NM (United States).
Haley, B., Dong, A., Tumer, I. Y. (2016). A comparison of network-based metrics of behavioral degradation in complex engineered systems. Journal of Mechanical Design, 138(12).
Holzel, N. B., Schilling, T., Gollnick, V. (2014). An aircraft lifecycle approach for the cost-benefit analysis of prognostics and condition-based maintenance-based on discrete-event simulation (Tech. Rep.). DLR-German Aerospace Center Hamburg Germany.
H¨onig, P., Lunde, R., & Holzapfel, F. (2017). Model based safety analysis with smartiflow. Information, 8(1), 7.
Howard, T. J., Culley, S. J., Dekoninck, E. (2008). Describing the creative design process by the integration of engineering design and cognitive psychology literature. Design studies, 29(2), 160–180.
Hu, Y. (2005). A guided simulation methodology for dynamic probabilistic risk assessment of complex systems (Unpublished doctoral dissertation).
Hulse, D., Hoyle, C., Goebel, K., Tumer, I. (2019b). Using value assessment to drive phm system development in early design. In Proceedings of the annual conference of the phm society (Vol. 11).
Hulse, D., Hoyle, C., Goebel, K., Tumer, I. Y. (2019a). Quantifying the resilience-informed scenario cost sum: A value-driven design approach for functional hazard assessment. Journal of Mechanical Design, 141(2).
Hulse, D., Walsh, H., Biswas, A., Zhang, H. (2021). Designengrlab/fmdtools: v0.6.1. Zenodo. Retrieved from doi: 10.5281/zenodo.4477725
Hunter, J. D. (2007). Matplotlib: A 2d graphics environment. Computing in Science & Engineering, 9(3), 90–95. doi: 10.1109/MCSE.2007.55
Irshad, L., Ahmed, S., Demirel, H. O., Tumer, I. Y. (2019). Computational functional failure analysis to identify human errors during early design stages. Journal of Computing and Information Science in Engineering, 19(3).
Jensen, D., Tumer, I. Y., Kurtoglu, T. (2009). Flow state logic (fsl) for analysis of failure propagation in early design. In Asme 2009 international design engineering technical conferences and computers and information in engineering conference (pp. 1033–1043).
Jha, S., Banerjee, S. S., Cyriac, J., Kalbarczyk, Z. T., Iyer, R. K. (2018). Avfi: Fault injection for autonomous vehicles. In 2018 48th annual ieee/ifip international conference on dependable systems and networks workshops (dsn-w) (pp. 55–56).
Jha, S., Tsai, T., Hari, S., Sullivan, M., Kalbarczyk, Z., Keckler, S. W., Iyer, R. K. (2019). Kayotee: A fault injection-based system to assess the safety and reliability of autonomous vehicles to faults and errors. arXiv preprint arXiv:1907.01024.
Joshi, A., Heimdahl, M. P. (2005). Model-based safety analysis of simulink models using scade design verifier. In International conference on computer safety, reliability, and security (pp. 122–135).
Joshi, A., Heimdahl, M. P. (2007). Behavioral fault modeling for model-based safety analysis. In 10th ieee high assurance systems engineering symposium (hase’07) (pp. 199–208).
Joshi, A., Heimdahl, M. P., Miller, S. P., Whalen, M. W. (2006). Model-based safety analysis.
Koll´arov´a, M. (2014). Fault injection testing of openstack. Ph.D. dissertation.
Krus, D., Lough, K. G. (2009). Function-based failure propagation for conceptual design. AI EDAM, 23(4), 409–426.
Kurtoglu, T., Tumer, I. Y. (2008). A graph-based fault identification and propagation framework for functional design of complex systems. Journal of mechanical design, 130(5).
Lattmann, Z., Pop, A., De Kleer, J., Fritzson, P., Janssen, B., Neema, S., . . . others (2014). Verification and design exploration through meta tool integration with openmodelica. In Proceedings of the 10 th international modelica conference; march 10-12; 2014; lund; sweden (pp. 353–362).
Lunde, K., Lunde, R., M¨unker, B. (2006). Model-based failure analysis with rodon. In Proceedings of the 2006 conference on ecai 2006: 17th european conference on artificial intelligence august 29–september 1, 2006, riva del garda, italy (pp. 647–651).
Martins, R., Gandhi, R., Narasimhan, P., Pertet, S., Casimiro, A., Kreutz, D., Verísimo, P. (2013). Experiences with fault-injection in a byzantine fault-tolerant protocol. In Acm/ifip/usenix international conference on distributed systems platforms and open distributed processing (pp. 41–61).
Matloff, N. (2008). Introduction to discrete-event simulation and the simpy language. Davis, CA. Dept of Computer Science. University of California at Davis. Retrieved on August, 2(2009), 1–33.
May, D., Stechele, W. (2012). An fpga-based probability-aware fault simulator. In 2012 international conference on embedded computer systems (samos) (pp. 302–309).
McIntire, M. G., Keshavarzi, E., Tumer, I. Y., Hoyle, C. (2016). Functional models with inherent behavior: Towards a framework for safety analysis early in the design of complex systems. In Asme 2016 international mechanical engineering congress and exposition.
McKenna, F. (2011). Opensees: a framework for earthquake engineering simulation. Computing in Science & Engineering, 13(4), 58–66.
Mehrpouyan, H., Haley, B., Dong, A., Tumer, I. Y., Hoyle, C. (2013). Resilient design of complex engineered systems against cascading failure. In Asme 2013 international mechanical engineering congress & exposition (Vol. 12: Systems and Design, p. V012T13A063). San Diego: ASME.
Miles, S. B. (2018). Participatory disaster recovery simulation modeling for community resilience planning. International Journal of Disaster Risk Science, 9(4), 519–529.
Minhas, R., De Kleer, J., Matei, I., Saha, B., Janssen, B., Bobrow, D. G., Kurtoglu, T. (2014). Using fault augmented modelica models for diagnostics. In Proceedings of the 10 th international modelica conference; march 10-12; 2014; lund; sweden (pp. 437–445).
Morozov, A., Ding, K., Steurer, M., Janschek, K. (2019). Openerrorpro: A new tool for stochastic model-based reliability and resilience analysis. In 2019 ieee 30th international symposium on software reliability engineering (issre) (pp. 303–312).
Morozov, A., Mutzke, T., Ren, B., Janschek, K. (2018). Aadl-based stochastic error propagation analysis for reliable system design of a medical patient table. In 2018 annual reliability and maintainability symposium (rams) (pp. 1–7).
Newman, M. E. J. (2010). Networks. New York, New York: Oxford University Press.
Niermann, T. M., Cheng, W.-T., Patel, J. H. (1992). Proofs: A fast, memory-efficient sequential circuit fault simulator. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 11(2), 198–207.
Noh, K.-W., Jun, H.-B., Lee, J.-H., Lee, G.-B., Suh, H.-W. (2011). Module-based failure propagation (mfp) model for fmea. The International Journal of Advanced Manufacturing Technology, 55(5-8), 581–600.
Pahl, G., Beitz,W. (2013). Engineering design: a systematic approach. Springer Science & Business Media. pandas development team, T. (2020, February). pandasdev/pandas: Pandas. Zenodo. Retrieved from doi: 10.5281/zenodo.3509134
Papadopoulos, Y., McDermid, J. A. (1999). Hierarchically performed hazard origin and propagation studies. In International conference on computer safety, reliability, and security (pp. 139–152).
Papadopoulos, Y., Walker, M., Parker, D., R¨ude, E., Hamann, R., Uhlig, A., . . . Lien, R. (2011). Engineering failure analysis and design optimisation with hip-hops. Engineering Failure Analysis, 18(2), 590–608.
Patelli, E., Broggi, M. (2015, 06). Uncertainty management and resilient design of safety critical systems. In Nafems world congress 2015.
Patelli, E., Tolo, S., George-Williams, H., Sadeghi, J., Rocchetta, R., de Angelis, M., Broggi, M. (2018). Open-cossan 2.0: an efficient computational toolbox for risk, reliability and resilience analysis. In Proceedings of the joint icvram isuma uncertainties conference (Vol. 2018).
Schirmeier, H., Hoffmann, M., Dietrich, C., Lenz, M., Lohmann, D., Spinczyk, O. (2015). Fail*: An open and versatile fault-injection framework for the assessment of software-implemented hardware fault tolerance. In 2015 11th european dependable computing conference (edcc) (pp. 245–255).
Schl¨omer, N., Papior, N. R., Ancellin, M., Arnold, D. (2020, April). nschloe/quadpy v0.14.7. Zenodo. Retrieved from doi: 10.5281/zenodo.3752151
Short, A. R. (2016). Design of autonomous systems for survivability through conceptual object-based risk analysis (Unpublished doctoral dissertation). Colorado School of Mines. Arthur Lakes Library.
Stewart, D., Liu, J. J., Whalen, M. W., Cofer, D., Peterson, M. (2018). Safety annex for the architecture analysis and design language.
Stone, R. B., Tumer, I. Y., Van Wie, M. (2005). The function-failure design method.
Treuner, F., H¨ubner, D., Baur, S., Wagner, S. M. (2014). A survey of disruptions in aviation and aerospace supply chains and recommendations for increasing resilience. Supply Chain Management, 14(3), 7–12.
van der Linden, F. L. (2014). General fault triggering architecture to trigger model faults in modelica using a standardized blockset. In Proceedings of the 10th international modelica conference-lund, sweden-mar 10-12, 2014 (pp. 427–436).
Viana, M. P., Tanck, E., Beletti, M. E., da Fontoura Costa, L. (2009). Modularity and robustness of bone networks. Molecular BioSystems, 5(3), 255–261.
Wadhawan, Y., Neuman, C. (2017). Bags: A tool to quantify smart grid resilience. In Fedcsis communication papers (pp. 323–332).
Walsh, H. S., Dong, A., Tumer, I. Y. (2018). The role of bridging nodes in behavioral network models of complex engineered systems. Design Science, 4(e8). doi: 10.1017/dsj.2017.31
Walsh, H. S., Dong, A., Tumer, I. Y. (2019). An analysis of modularity as a design rule using network theory Journal of Mechanical Design, 141(3), 031102.
Wang, Z., Cui, Y., Shi, J. (2015). A framework of discrete-event simulation modeling for prognostics and health management (phm) in airline industry. IEEE Systems Journal, 11(4), 2227–2238.
Winter, S., Piper, T., Schwahn, O., Natella, R., Suri, N., Cotroneo, D. (2015). Grinder: On reusability of fault injection tools. In 2015 ieee/acm 10th international workshop on automation of software test (pp. 75–79).
Yodo, N., Wang, P. (2016a). Engineering resilience quantification and system design implications: A literature survey. Journal of Mechanical Design, 138(11).
Yodo, N., Wang, P. (2016b). Resilience allocation for early stage design of complex engineered systems. Journal of Mechanical Design, 138(9).
Youn, B. D., Hu, C., Wang, P. (2011). Resilience-driven system design of complex engineered systems. Journal of Mechanical Design, 133(10).
Technical Papers