Leveraging Few-Shot In-Context Learning for Scaling Railway Log Anomaly Detection

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Jan 13, 2026
Quentin Possamaï Rajesh Bonangi Alexandre Trilla Ossee Josepha Charlesia Yiboe Kenza Saiah Nenad Mijatovic

Abstract

This paper presents a scalable, data-driven approach for anomaly detection in railway signaling logs using Large Language Models (LLMs) and in-context learning. By classifying log keys — the structural templates of log messages — instead of individual messages, the method dramatically reduces the number of required model calls, thereby lowering computational costs (in terms of energy or monetary resources). Expert-labeled log keys are incorporated into LLM prompts to help the models differentiate between normal and abnormal log messages. Multiple state-of-the-art LLMs are evaluated on this task, revealing that performance increases as more labeled examples are added to the prompt, although the improvement gain diminishes with each additional label. Further analysis indicates that GPT-4.1 offers the best balance of monetary cost, response time, and F1 score for this application. The study highlights both the advantages and limitations of in-context learning for railway log anomaly detection, notably its ability to leverage expert-labeled examples without additional model training, but also its sensitivity to data imbalance and exclusion of parameter values. It further discusses avenues for future improvement, such as model fine-tuning, prompt enrichment with additional contextual information, and the potential use of Retrieval-Augmented Generation (RAG) or self-feedback strategies to enhance classification performance.
Abstract 54 | PDF Downloads 33

##plugins.themes.bootstrap3.article.details##

Keywords

Anomaly detection, Railway signaling logs, Large Language Models, In-context learning, Log parsing

References
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., . . . Amodei, D. (2020). Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems. Curran Associates, Inc. Retrieved from https:// papers.nips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv. Retrieved from http://arxiv.org/abs/1810.04805 doi: https://doi.org/10.48550/arXiv.1810.04805

Ding, X., Yang, X., Hu, H., & Liu, Z. (2017,April). The safety management of urban rail transit based on operation fault log. Safety Science. Retrieved from https://www.sciencedirect.com/science/article/pii/S092575351630697X doi: https:// doi.org/10.1016/j.ssci.2016.12.015

Du, M., & Li, F. (2016, December). Spell: Streaming Parsing of System Event Logs. In 2016 IEEE 16th International Conference on Data Mining (ICDM). Retrieved from https://ieeexplore.ieee.org/document/7837916 doi: https://doi.org/
10.1109/ICDM.2016.0103

Du, M., Li, F., Zheng, G., & Srikumar, V. (2017,October). DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. In Proceedings of the 2017 ACM CCS Conference on Computer and Communications Security. New
York, NY, USA: Association for Computing Machinery. Retrieved from https://dl.acm.org/doi/10.1145/3133956.3134015 doi: https://doi.org/10.1145/3133956.3134015

European Rail Traffic Management System. (2018, September). Retrieved from https://www.era.europa.eu/domains/infrastructure/ european-rail-traffic-management-system-ertms en

Gerhards, R. (2009, March). The Syslog Protocol (Request for Comments). Internet Engineering Task Force. Retrieved from https://datatracker.ietf.org/doc/rfc5424 doi: https://doi.org/10.17487/RFC5424

Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., . . . Ma, Z. (2024, November). The Llama 3 Herd of Models. arXiv. Retrieved from http://arxiv.org/abs/2407.21783 doi: https://doi.org/10.48550/arXiv.2407.21783

He, P., Zhu, J., He, S., Li, J., & Lyu, M. R. (2016,June). An Evaluation Study on Log Parsing and Its Use in Log Mining. In 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). Retrieved from https://ieeexplore.ieee.org/document/7579781 doi: https://doi.org/10.1109/DSN.2016.66

Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika,M., Song, D., & Steinhardt, J. (2021, January). Measuring Massive Multitask Language Understanding. arXiv. Retrieved from http://arxiv.org/abs/2009.03300 doi: https://doi.org/ 10.48550/arXiv.2009.03300

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang,S., Chen, W. (2021, October). LoRA: Low-Rank Adaptation of Large Language Models. arXiv.Retrieved from http://arxiv.org/abs/2106.09685 doi: https://doi.org/10.48550/arXiv.2106.09685

Huang, S., Liu, Y., Fung, C., He, R., Zhao, Y., Yang,H., & Luan, Z. (2020, December). HitAnomaly: Hierarchical Transformers for Anomaly Detection in System Log. IEEE Transactions on Network and Service Management. Retrieved from https://ieeexplore.ieee.org/abstract/document/9244088 doi: https://doi.org/10.1109/TNSM.2020.3034647

Klumpenhouwer, W., & Shalaby, A. (2022, September). Using Delay Logs and Machine Learning to Support Passenger Railway Operations. Transportation Research Record. Retrieved from https://doi.org/10.1177/03611981221085561 doi: https://doi.org/10.1177/03611981221085561

Kobayashi, S., Otomo, K., Fukuda, K., & Esaki,H. (2018, March). Mining Causality of Network Events in Log Data. IEEE Transactions on Network and Service Management. Retrieved from https://ieeexplore.ieee.org/
document/8122062 doi: https://doi.org/10.1109/TNSM.2017.2778096

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., . . . Rockt¨aschel, T. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems. Retrieved from https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html

Liu, H., Li, C., Li, Y., & Lee, Y. J. (2024, May). Improved Baselines with Visual Instruction Tuning. arXiv. Retrieved from http://arxiv.org/abs/2310.03744 doi: https://doi.org/10.48550/arXiv.2310.03744

Liu, Y., Tao, S., Meng, W., Yao, F., Zhao, X., & Yang, H. (2024, April). LogPrompt: Prompt Engineering Towards Zero-Shot and Interpretable Log Analysis. In 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). Retrieved from https://ieeexplore.ieee.org/document/10554918 doi: https://doi .org/10.1145/3639478.3643108

Mannhardt, F., & Landmark, A. D. (2019, January). Mining railway traffic control logs. Transportation Research Procedia. Retrieved from https://www.sciencedirect.com/science/article/pii/S2352146518306021 doi: https:// doi.org/10.1016/j.trpro.2018.12.187

Meng, W., Liu, Y., Zhu, Y., Zhang, S., Pei, D., Liu, Y., Zhou, R. (2019). LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs. In International Joint Conference on Artificial Intelligence. Retrieved from https://www.ijcai.org/proceedings/2019/658

Mohamad, N. H., Hashim, H., Abdul Hamid, N. A., & Ismail, M. I. A. (2021, September). Dashboard for analyzing SCADA data log: A case study of urban railway in Malaysia. International Journal of Advances in Applied Sciences. Retrieved from http://ijaas.iaescore.com/index.php/IJAAS/article/view/20480 doi: https://doi.org/10.11591/ijaas.v10.i3.pp251-260

Mudgal, P., & Wouhaybi, R. (2024). An Assessment of ChatGPT on Log Data. In F. Zhao & D. Miao (Eds.), AI-generated Content. Singapore: Springer Nature. doi: https://doi.org/10.1007/978-981-99-7587-7 13

OpenAI. (2024a, March). GPT-4 Technical Report. arXiv. Retrieved from http://arxiv.org/abs/2303
.08774 doi: https://doi.org/10.48550/arXiv.2303.08774

OpenAI. (2024b, May). Hello GPT-4o. Retrieved from https://openai.com/index/hello-gpt-4o/

OpenAI. (2024c, September). Learning to reason with LLMs. Retrieved from https://openai.com/index/learning-to-reason-with-llms/
OpenAI. (2024d, September). OpenAI o1-mini.Retrieved from https://openai.com/index/openai-o1-mini-advancing-cost-efficient-reasoning/

OpenAI. (2025, April). Introducing GPT-4.1 in the API. Retrieved from https://openai.com/index/gpt-4-1/

Pan, J., Liang, W. S., & Yidi, Y. (2024, May). RAGLog: Log Anomaly Detection using Retrieval Augmented Gen-
eration. In 2024 IEEE World Forum on Public Safety Technology (WFPST). Herndon, VA, USA: IEEE. Retrieved from https://ieeexplore.ieee.org/document/10607047/ doi: https://doi.org/10.1109/WFPST58552.2024.00034

Spirtes, P., Glymour, C., & Scheines, R. (2001). Causation, Prediction, and Search. The MIT Press. Retrieved
from https://direct.mit.edu/books/monograph/2057/Causation-Prediction-and-Search doi: https://doi.org/ 10.7551/mitpress/1754.001.0001

Yu, S., He, P., Chen, N., & Wu, Y. (2023, September). Brain: Log Parsing With Bidirectional Parallel Tree. IEEE Transactions on Services Computing. Retrieved from https://ieeexplore.ieee.org/abstract/document/10109145 doi: https://doi.org/10.1109/TSC.2023.3270566

Zhang, L., Jia, T., Jia, M., Wu, Y., Liu, H., & Li, Y. (2025, February). XRAGLog: A Resource-Efficient and Context-Aware Log-Based Anomaly Detection Method Using Retrieval-Augmented Generation. In AAAI 2025 Workshop on Preventing
and Detecting LLM Misinformation (PDLM). Retrieved from https://openreview.net/forum?id=8gv7CXuXQ3

Zhao, H., Hui, J., Howland, J., Nguyen, N., Zuo, S., Hu, A., Huffman, S. (2024, June). CodeGemma: Open Code Models Based on Gemma. arXiv. Retrieved from http://arxiv.org/abs/2406.11409 doi: https://doi.org/10.48550/arXiv.2406.11409

Zhu, J., He, S., Liu, J., He, P., Xie, Q., Zheng, Z., & Lyu, M. R. (2019, January). Tools and Benchmarks for Automated Log Parsing. In International Conference on Software Engineering (ICSE). arXiv. Retrieved from http://arxiv.org/abs/1811.03509
Section
Regular Session Papers