A Comprehensive Approach to Fault Classification of Helicopter Engines with Adaboost Ensemble Model

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Nov 6, 2024
Peeyush Pankaj Sammit Jain Shyam Joshi

Abstract

This work is based on the PHM North America 2024 Conference Data Challenge’s datasets of Helicopter turbine engine performance measurements. These datasets were large and moderately imbalanced. For dealing with these challenges, we demonstrate significant tools covering feature engineering, augmentation and selection, model exploration, visualizations, model explainability and confidence margin estimation. This work was performed in its entirety using MATLAB. All these tools will be generally applicable to data-driven modeling and prediction of health to real life applications.

Initially, we explored the 742k observations in the training set, noting a 60-40 split between healthy and faulty labels, and identified two major operational clusters within the data. We enhanced the dataset by removing duplicates and engineered new features based on domain knowledge, expanding the feature set to 242 dimensions.

However for the torque margin estimation, we trained a regression model on a limited subset (18 features), which includes engineered features using domain knowledge, quadratic terms and linear interaction between all the terms. For the final submission, we utilized a stepwise linear regression model to optimize feature selection. This approach achieved a perfect regression score on test data, validated by a consistent torque margin residual range of +/- 0.5%. The model's RMSE and MAE metrics were optimal for employing a normal distribution probability density function.

For , we reduced the feature set to 58 using dimensionality reduction techniques and balanced the data with upsampling and down-weighing the minority class. We employed ASHA (Asynchronous Successive Halving Algorithm) in conjunction with AutoML to efficiently determine the most suitable model family, significantly saving compute time. Subsequently, we trained ensemble models, including bagged tree and AdaBoost (Adaptive Boosting), which minimized false negatives and positives, achieving robust classification performance. This was particularly critical given the high penalty for false negatives in the data challenge. The MathWorks team score on Testing Data was 0.9686 at the close of competition.  This was further improved to 0.9867.

Our approach demonstrates the effectiveness of combining strategic data processing, feature engineering, and model selection to enhance predictive accuracy in complex operational datasets.

How to Cite

Pankaj, P., Jain, S., & Joshi, S. (2024). A Comprehensive Approach to Fault Classification of Helicopter Engines with Adaboost Ensemble Model. Annual Conference of the PHM Society, 16(1). https://doi.org/10.36001/phmconf.2024.v16i1.4194
Abstract 67 | PDF Downloads 57

##plugins.themes.bootstrap3.article.details##

Keywords

Fault Classification, Machine Learning, Adaptive Boosting, Feature Enginering, Shapley, Explainable AI, Helicopter Engines

References
Bechhoefer, E., & Hajimohammadali, M. . (2023). Process for Turboshaft Engine Performance Trending. Annual Conference of the PHM Society, 15(1). https://doi.org/10.36001/phmconf.2023.v15i1.3490

MathWorks, “Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles”, MathWorks Documentation. [Online]. Available: https://in.mathworks.com/help/stats/classification-withunequal-misclassification-costs.html

Zhou, Z.-H., and X.-Y. Liu. “On Multi-Class Cost-Sensitive Learning.” Computational Intelligence. Vol. 26, Issue 3, 2010, pp. 232–257 CiteSeerX.

Seiffert, C., T. Khoshgoftaar, J. Hulse, and A. Napolitano. "RUSBoost: Improving classification performance when training data is skewed." 19th International Conference on Pattern Recognition, 2008, pp. 1–4.

Federal Aviation Administration, Helicopter Flying Handbook, FAA-H-8083-21B, U.S. Department of Transportation, 2019. [Online]. Available: https://www.faa.gov/sites/faa.gov/files/regulations_policies/handbooks_manuals/aviation/faa-h-8083-21.pdf.[Accessed: 20-Sep-2024]

Stoppiglia, Hervé, Gérard Dreyfus, Rémi Dubois, and Yacine Oussar. "Ranking a random feature for variable and feature selection." The Journal of Machine Learning Research 3 (2003): 1399-1414.

MathWorks, "Predictor importance for ensemble models,” MathWorks Documentation. [Online]. Available: https://in.mathworks.com/help/stats/classreg.learning.classif.compactclassificationensemble.predictorimportance.html

MathWorks, “Automated classifier selection with Bayesian and ASHA optimization,” MathWorks Documentation [Online]. Available: https://www.mathworks.com/help/releases/R2024b/stats/automated-classifier-selection-with-bayesianoptimization.html

Li, Liam, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Moritz Hardt, Benjamin Recht, and Ameet Talwalkar. “A System for Massively Parallel Hyperparameter Tuning.” ArXiv:1810.05934v5 [Cs], March 16, 2020. https://arxiv.org/abs/1810.05934v5.

Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Boca Raton, FL: Chapman & Hall, 1984.

Lundberg, Scott M., and S. Lee. "A Unified Approach to Interpreting Model Predictions." Advances in Neural Information Processing Systems 30 (2017): 4765–774.
Section
Data Challenge Papers