A Comprehensive Approach to Fault Classification of Helicopter Engines with Adaboost Ensemble Model
##plugins.themes.bootstrap3.article.main##
##plugins.themes.bootstrap3.article.sidebar##
Abstract
This work is based on the PHM North America 2024 Conference Data Challenge’s datasets of Helicopter turbine engine performance measurements. These datasets were large and moderately imbalanced. For dealing with these challenges, we demonstrate significant tools covering feature engineering, augmentation and selection, model exploration, visualizations, model explainability and confidence margin estimation. This work was performed in its entirety using MATLAB. All these tools will be generally applicable to data-driven modeling and prediction of health to real life applications.
Initially, we explored the 742k observations in the training set, noting a 60-40 split between healthy and faulty labels, and identified two major operational clusters within the data. We enhanced the dataset by removing duplicates and engineered new features based on domain knowledge, expanding the feature set to 242 dimensions.
However for the torque margin estimation, we trained a regression model on a limited subset (18 features), which includes engineered features using domain knowledge, quadratic terms and linear interaction between all the terms. For the final submission, we utilized a stepwise linear regression model to optimize feature selection. This approach achieved a perfect regression score on test data, validated by a consistent torque margin residual range of +/- 0.5%. The model's RMSE and MAE metrics were optimal for employing a normal distribution probability density function.
For , we reduced the feature set to 58 using dimensionality reduction techniques and balanced the data with upsampling and down-weighing the minority class. We employed ASHA (Asynchronous Successive Halving Algorithm) in conjunction with AutoML to efficiently determine the most suitable model family, significantly saving compute time. Subsequently, we trained ensemble models, including bagged tree and AdaBoost (Adaptive Boosting), which minimized false negatives and positives, achieving robust classification performance. This was particularly critical given the high penalty for false negatives in the data challenge. The MathWorks team score on Testing Data was 0.9686 at the close of competition. This was further improved to 0.9867.
Our approach demonstrates the effectiveness of combining strategic data processing, feature engineering, and model selection to enhance predictive accuracy in complex operational datasets.
How to Cite
##plugins.themes.bootstrap3.article.details##
Fault Classification, Machine Learning, Adaptive Boosting, Feature Enginering, Shapley, Explainable AI, Helicopter Engines
MathWorks, “Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles”, MathWorks Documentation. [Online]. Available: https://in.mathworks.com/help/stats/classification-withunequal-misclassification-costs.html
Zhou, Z.-H., and X.-Y. Liu. “On Multi-Class Cost-Sensitive Learning.” Computational Intelligence. Vol. 26, Issue 3, 2010, pp. 232–257 CiteSeerX.
Seiffert, C., T. Khoshgoftaar, J. Hulse, and A. Napolitano. "RUSBoost: Improving classification performance when training data is skewed." 19th International Conference on Pattern Recognition, 2008, pp. 1–4.
Federal Aviation Administration, Helicopter Flying Handbook, FAA-H-8083-21B, U.S. Department of Transportation, 2019. [Online]. Available: https://www.faa.gov/sites/faa.gov/files/regulations_policies/handbooks_manuals/aviation/faa-h-8083-21.pdf.[Accessed: 20-Sep-2024]
Stoppiglia, Hervé, Gérard Dreyfus, Rémi Dubois, and Yacine Oussar. "Ranking a random feature for variable and feature selection." The Journal of Machine Learning Research 3 (2003): 1399-1414.
MathWorks, "Predictor importance for ensemble models,” MathWorks Documentation. [Online]. Available: https://in.mathworks.com/help/stats/classreg.learning.classif.compactclassificationensemble.predictorimportance.html
MathWorks, “Automated classifier selection with Bayesian and ASHA optimization,” MathWorks Documentation [Online]. Available: https://www.mathworks.com/help/releases/R2024b/stats/automated-classifier-selection-with-bayesianoptimization.html
Li, Liam, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Moritz Hardt, Benjamin Recht, and Ameet Talwalkar. “A System for Massively Parallel Hyperparameter Tuning.” ArXiv:1810.05934v5 [Cs], March 16, 2020. https://arxiv.org/abs/1810.05934v5.
Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Boca Raton, FL: Chapman & Hall, 1984.
Lundberg, Scott M., and S. Lee. "A Unified Approach to Interpreting Model Predictions." Advances in Neural Information Processing Systems 30 (2017): 4765–774.
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
The Prognostic and Health Management Society advocates open-access to scientific data and uses a Creative Commons license for publishing and distributing any papers. A Creative Commons license does not relinquish the author’s copyright; rather it allows them to share some of their rights with any member of the public under certain conditions whilst enjoying full legal protection. By submitting an article to the International Conference of the Prognostics and Health Management Society, the authors agree to be bound by the associated terms and conditions including the following:
As the author, you retain the copyright to your Work. By submitting your Work, you are granting anybody the right to copy, distribute and transmit your Work and to adapt your Work with proper attribution under the terms of the Creative Commons Attribution 3.0 United States license. You assign rights to the Prognostics and Health Management Society to publish and disseminate your Work through electronic and print media if it is accepted for publication. A license note citing the Creative Commons Attribution 3.0 United States License as shown below needs to be placed in the footnote on the first page of the article.
First Author et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.