Decoding Breast Cancer Mutational Signatures A Hybrid ElasticNet–XGBoost Approach Using Gene Expression Data

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Apr 18, 2026
Omji Porwal Kamal Upreti Pravin R. Kshirsagar Sarika Panwar Anurag Sharma Ganesh V. Radhakrishnan Rituraj Jain

Abstract

TP53, PIK3CA, and MUC16 are somatic mutations that are useful in breast cancer progression and prognosis, but direct mutation profiling based on sequencing is not always practicable in practice. The data about gene expression can contain indirect transcriptomic patterns linked with mutational underlying states. This paper proposes an expression-based machine learning model to predict the status of mutations using METABRIC breast cancer cohort. Instead of directly estimating genetic changes, the suggested method estimates statistical relationships between transcriptomic phenotypes and binary somatic mutation states. A multi-stage gene features selection pipeline using variance filtering, mutual information ranking, and correlation pruning was used to reduce the number of genes (19,000). A hybrid predictive architecture was trained using these features that combined ElasticNet logistic regression and XGBoost that allowed balancing between linear regularization and nonlinear interaction modeling. The hybrid model with a combination of five-fold stratified cross-validation yielded mean ROC-AUC of 0.94 (TP53), 0.92 (PIK3CA), and 0.90 (MUC16) with the stability of the calibration and equal error rates. Coefficient analysis and SHAP-based explanations were used to investigate the interpretability of the models to describe the expression patterns on mutation status. The suggested framework is a hypothesis-generating, complementary method of transcriptomic analysis, which must be reevaluated by external validation to determine the wider generalizability.

Abstract 33 | PDF Downloads 8

##plugins.themes.bootstrap3.article.details##

Keywords

Breast Cancer, Gene Expressions, Mutational Signatures, Precision Oncology, Machine Learning Models

References
Algamal, Z. Y., & Lee, M. H. (2015). Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification. Computers in Biology and Medicine, 67, 136–145. https://doi.org/10.1016/j.compbiomed.2015.10.008
Andreu-Vilarroig, C., Ceberio, J., Cortés, J.-C., de Vega, F. F., Hidalgo, J.-I., & Villanueva, R.-J. (2022). Evolutionary approach to model calibration with uncertainty. Proceedings of the Genetic and Evolutionary Computation Conference Companion, 1895–1901. https://doi.org/10.1145/3520304.3533948
Brady, S. W., Gout, A. M., & Zhang, J. (2022). Therapeutic and prognostic insights from the analysis of cancer mutational signatures. Trends in Genetics, 38(2), 194–208. https://doi.org/10.1016/j.tig.2021.08.007
Breast Cancer Gene Expression Profiles (METABRIC). (n.d.). Www.Kaggle.Com.
Dinalankara, W., & Bravo, H. C. (2015). Gene Expression Signatures Based on Variability can Robustly Predict Tumor Progression and Prognosis. Cancer Informatics, 14, CIN.S23862. https://doi.org/10.4137/CIN.S23862
Fu, X., Tan, W., Song, Q., Pei, H., & Li, J. (2022). BRCA1 and Breast Cancer: Molecular Mechanisms and Therapeutic Strategies. Frontiers in Cell and Developmental Biology, 10. https://doi.org/10.3389/fcell.2022.813457
Gale, R. P., Hochhaus, A., Cross, N. C. P., & Harrison, C. J. (2021). HGNC nomenclature for fusion genes. Leukemia, 35(11), 3039–3039. https://doi.org/10.1038/s41375-021-01437-5
Horr, C., & Buechler, S. A. (2021). Breast Cancer Consensus Subtypes: A system for subtyping breast cancer tumors based on gene expression. Npj Breast Cancer, 7(1), 136. https://doi.org/10.1038/s41523-021-00345-2
Khan, Z., Naeem, M., Khalil, U., Khan, D. M., Aldahmani, S., & Hamraz, M. (2019). Feature Selection for Binary Classification Within Functional Genomics Experiments via Interquartile Range and Clustering. IEEE Access, 7, 78159–78169. https://doi.org/10.1109/ACCESS.2019.2922432
Kim, J. W., Lee, J., Lee, S. H., Ahn, S., & Park, K. H. (2025). Machine Learning–Based Prognostic Gene Signature for Early Triple-Negative Breast Cancer. Cancer Research and Treatment, 57(3), 731–740. https://doi.org/10.4143/crt.2024.937
Koo, N., Sharma, A. K., & Narayan, S. (2022). Therapeutics Targeting p53-MDM2 Interaction to Induce Cancer Cell Death. International Journal of Molecular Sciences, 23(9), 5005. https://doi.org/10.3390/ijms23095005
Lee, Y.-R., Chen, M., Lee, J. D., Zhang, J., Lin, S.-Y., Fu, T.-M., Chen, H., Ishikawa, T., Chiang, S.-Y., Katon, J., Zhang, Y., Shulga, Y. V., Bester, A. C., Fung, J., Monteleone, E., Wan, L., Shen, C., Hsu, C.-H., Papa, A., … Pandolfi, P. P. (2019). Reactivation of PTEN tumor suppressor for cancer treatment through inhibition of a MYC-WWP1 inhibitory pathway. Science, 364(6441). https://doi.org/10.1126/science.aau0159
Li, Q., Yang, H., Wang, P., Liu, X., Lv, K., & Ye, M. (2022). XGBoost-based and tumor-immune characterized gene signature for the prediction of metastatic status in breast cancer. Journal of Translational Medicine, 20(1), 177. https://doi.org/10.1186/s12967-022-03369-9
Lundberg, S., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 4768–4777.
Mallik, S., & Zhao, Z. (2017). Towards integrated oncogenic marker recognition through mutual information‐based statistically significant feature extraction: an association rule mining based study on cancer expression and methylation profiles. Quantitative Biology, 5(4), 302–327. https://doi.org/10.1007/s40484-017-0119-0
Mukherjee, A., Russell, R., Chin, S.-F., Liu, B., Rueda, O. M., Ali, H. R., Turashvili, G., Mahler-Araujo, B., Ellis, I. O., Aparicio, S., Caldas, C., & Provenzano, E. (2018). Associations between genomic stratification of breast cancer and centrally reviewed tumour pathology in the METABRIC cohort. Npj Breast Cancer, 4(1), 5. https://doi.org/10.1038/s41523-018-0056-8
Munkácsy, G., Santarpia, L., & Győrffy, B. (2022). Gene Expression Profiling in Early Breast Cancer—Patient Stratification Based on Molecular and Tumor Microenvironment Features. Biomedicines, 10(2), 248. https://doi.org/10.3390/biomedicines10020248
Odhiambo, P., Okello, H., Wakaanya, A., Wekesa, C., & Okoth, P. (2023). Mutational signatures for breast cancer diagnosis using artificial intelligence. Journal of the Egyptian National Cancer Institute, 35(1), 14. https://doi.org/10.1186/s43046-023-00173-4
Ogundokun, R. O., Misra, S., Douglas, M., Damaševičius, R., & Maskeliūnas, R. (2022). Medical Internet-of-Things Based Breast Cancer Diagnosis Using Hyperparameter-Optimized Neural Networks. Future Internet, 14(5), 153. https://doi.org/10.3390/fi14050153
Qian, Y., Daza, J., Itzel, T., Betge, J., Zhan, T., Marmé, F., & Teufel, A. (2021). Prognostic Cancer Gene Expression Signatures: Current Status and Challenges. Cells, 10(3), 648. https://doi.org/10.3390/cells10030648
Qin, F., Luo, X., Cai, G., & Xiao, F. (2021). Shall genomic correlation structure be considered in copy number variants detection? Briefings in Bioinformatics, 22(6). https://doi.org/10.1093/bib/bbab215
Seachrist, D. D., Anstine, L. J., & Keri, R. A. (2021). FOXA1: A Pioneer of Nuclear Receptor Action in Breast Cancer. Cancers, 13(20), 5205. https://doi.org/10.3390/cancers13205205
Senbanjo, L. T., & Chellaiah, M. A. (2017). CD44: A Multifunctional Cell Surface Adhesion Receptor Is a Regulator of Progression and Metastasis of Cancer Cells. Frontiers in Cell and Developmental Biology, 5. https://doi.org/10.3389/fcell.2017.00018
Shi, H., Wu, C., Bai, T., Chen, J., Li, Y., & Wu, H. (2023). Identify essential genes based on clustering based synthetic minority oversampling technique. Computers in Biology and Medicine, 153, 106523. https://doi.org/10.1016/j.compbiomed.2022.106523
Thalor, A., Kumar Joon, H., Singh, G., Roy, S., & Gupta, D. (2022). Machine learning assisted analysis of breast cancer gene expression profiles reveals novel potential prognostic biomarkers for triple-negative breast cancer. Computational and Structural Biotechnology Journal, 20, 1618–1631. https://doi.org/10.1016/j.csbj.2022.03.019
Zhang, G., Hou, S., Li, S., Wang, Y., & Cui, W. (2024). Role of STAT3 in cancer cell epithelial mesenchymal transition (Review). International Journal of Oncology, 64(5), 48. https://doi.org/10.3892/ijo.2024.5636
Section
Technical Papers