Leveraging Few-Shot In-Context Learning for Scaling Railway Log Anomaly Detection
##plugins.themes.bootstrap3.article.main##
##plugins.themes.bootstrap3.article.sidebar##
Published
Jan 13, 2026
Quentin Possamaï
Rajesh Bonangi
Alexandre Trilla
Ossee Josepha Charlesia Yiboe
Kenza Saiah
Nenad Mijatovic
Abstract
This paper presents a scalable, data-driven approach for anomaly detection in railway signaling logs using Large Language Models (LLMs) and in-context learning. By classifying log keys — the structural templates of log messages — instead of individual messages, the method dramatically reduces the number of required model calls, thereby lowering computational costs (in terms of energy or monetary resources). Expert-labeled log keys are incorporated into LLM prompts to help the models differentiate between normal and abnormal log messages. Multiple state-of-the-art LLMs are evaluated on this task, revealing that performance increases as more labeled examples are added to the prompt, although the improvement gain diminishes with each additional label. Further analysis indicates that GPT-4.1 offers the best balance of monetary cost, response time, and F1 score for this application. The study highlights both the advantages and limitations of in-context learning for railway log anomaly detection, notably its ability to leverage expert-labeled examples without additional model training, but also its sensitivity to data imbalance and exclusion of parameter values. It further discusses avenues for future improvement, such as model fine-tuning, prompt enrichment with additional contextual information, and the potential use of Retrieval-Augmented Generation (RAG) or self-feedback strategies to enhance classification performance.
##plugins.themes.bootstrap3.article.details##
Keywords
Anomaly detection, Railway signaling logs, Large Language Models, In-context learning, Log parsing
References
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., . . . Amodei, D. (2020). Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems. Curran Associates, Inc. Retrieved from https:// papers.nips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv. Retrieved from http://arxiv.org/abs/1810.04805 doi: https://doi.org/10.48550/arXiv.1810.04805
Ding, X., Yang, X., Hu, H., & Liu, Z. (2017,April). The safety management of urban rail transit based on operation fault log. Safety Science. Retrieved from https://www.sciencedirect.com/science/article/pii/S092575351630697X doi: https:// doi.org/10.1016/j.ssci.2016.12.015
Du, M., & Li, F. (2016, December). Spell: Streaming Parsing of System Event Logs. In 2016 IEEE 16th International Conference on Data Mining (ICDM). Retrieved from https://ieeexplore.ieee.org/document/7837916 doi: https://doi.org/
10.1109/ICDM.2016.0103
Du, M., Li, F., Zheng, G., & Srikumar, V. (2017,October). DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. In Proceedings of the 2017 ACM CCS Conference on Computer and Communications Security. New
York, NY, USA: Association for Computing Machinery. Retrieved from https://dl.acm.org/doi/10.1145/3133956.3134015 doi: https://doi.org/10.1145/3133956.3134015
European Rail Traffic Management System. (2018, September). Retrieved from https://www.era.europa.eu/domains/infrastructure/ european-rail-traffic-management-system-ertms en
Gerhards, R. (2009, March). The Syslog Protocol (Request for Comments). Internet Engineering Task Force. Retrieved from https://datatracker.ietf.org/doc/rfc5424 doi: https://doi.org/10.17487/RFC5424
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., . . . Ma, Z. (2024, November). The Llama 3 Herd of Models. arXiv. Retrieved from http://arxiv.org/abs/2407.21783 doi: https://doi.org/10.48550/arXiv.2407.21783
He, P., Zhu, J., He, S., Li, J., & Lyu, M. R. (2016,June). An Evaluation Study on Log Parsing and Its Use in Log Mining. In 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). Retrieved from https://ieeexplore.ieee.org/document/7579781 doi: https://doi.org/10.1109/DSN.2016.66
Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika,M., Song, D., & Steinhardt, J. (2021, January). Measuring Massive Multitask Language Understanding. arXiv. Retrieved from http://arxiv.org/abs/2009.03300 doi: https://doi.org/ 10.48550/arXiv.2009.03300
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang,S., Chen, W. (2021, October). LoRA: Low-Rank Adaptation of Large Language Models. arXiv.Retrieved from http://arxiv.org/abs/2106.09685 doi: https://doi.org/10.48550/arXiv.2106.09685
Huang, S., Liu, Y., Fung, C., He, R., Zhao, Y., Yang,H., & Luan, Z. (2020, December). HitAnomaly: Hierarchical Transformers for Anomaly Detection in System Log. IEEE Transactions on Network and Service Management. Retrieved from https://ieeexplore.ieee.org/abstract/document/9244088 doi: https://doi.org/10.1109/TNSM.2020.3034647
Klumpenhouwer, W., & Shalaby, A. (2022, September). Using Delay Logs and Machine Learning to Support Passenger Railway Operations. Transportation Research Record. Retrieved from https://doi.org/10.1177/03611981221085561 doi: https://doi.org/10.1177/03611981221085561
Kobayashi, S., Otomo, K., Fukuda, K., & Esaki,H. (2018, March). Mining Causality of Network Events in Log Data. IEEE Transactions on Network and Service Management. Retrieved from https://ieeexplore.ieee.org/
document/8122062 doi: https://doi.org/10.1109/TNSM.2017.2778096
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., . . . Rockt¨aschel, T. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems. Retrieved from https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html
Liu, H., Li, C., Li, Y., & Lee, Y. J. (2024, May). Improved Baselines with Visual Instruction Tuning. arXiv. Retrieved from http://arxiv.org/abs/2310.03744 doi: https://doi.org/10.48550/arXiv.2310.03744
Liu, Y., Tao, S., Meng, W., Yao, F., Zhao, X., & Yang, H. (2024, April). LogPrompt: Prompt Engineering Towards Zero-Shot and Interpretable Log Analysis. In 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). Retrieved from https://ieeexplore.ieee.org/document/10554918 doi: https://doi .org/10.1145/3639478.3643108
Mannhardt, F., & Landmark, A. D. (2019, January). Mining railway traffic control logs. Transportation Research Procedia. Retrieved from https://www.sciencedirect.com/science/article/pii/S2352146518306021 doi: https:// doi.org/10.1016/j.trpro.2018.12.187
Meng, W., Liu, Y., Zhu, Y., Zhang, S., Pei, D., Liu, Y., Zhou, R. (2019). LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs. In International Joint Conference on Artificial Intelligence. Retrieved from https://www.ijcai.org/proceedings/2019/658
Mohamad, N. H., Hashim, H., Abdul Hamid, N. A., & Ismail, M. I. A. (2021, September). Dashboard for analyzing SCADA data log: A case study of urban railway in Malaysia. International Journal of Advances in Applied Sciences. Retrieved from http://ijaas.iaescore.com/index.php/IJAAS/article/view/20480 doi: https://doi.org/10.11591/ijaas.v10.i3.pp251-260
Mudgal, P., & Wouhaybi, R. (2024). An Assessment of ChatGPT on Log Data. In F. Zhao & D. Miao (Eds.), AI-generated Content. Singapore: Springer Nature. doi: https://doi.org/10.1007/978-981-99-7587-7 13
OpenAI. (2024a, March). GPT-4 Technical Report. arXiv. Retrieved from http://arxiv.org/abs/2303
.08774 doi: https://doi.org/10.48550/arXiv.2303.08774
OpenAI. (2024b, May). Hello GPT-4o. Retrieved from https://openai.com/index/hello-gpt-4o/
OpenAI. (2024c, September). Learning to reason with LLMs. Retrieved from https://openai.com/index/learning-to-reason-with-llms/
OpenAI. (2024d, September). OpenAI o1-mini.Retrieved from https://openai.com/index/openai-o1-mini-advancing-cost-efficient-reasoning/
OpenAI. (2025, April). Introducing GPT-4.1 in the API. Retrieved from https://openai.com/index/gpt-4-1/
Pan, J., Liang, W. S., & Yidi, Y. (2024, May). RAGLog: Log Anomaly Detection using Retrieval Augmented Gen-
eration. In 2024 IEEE World Forum on Public Safety Technology (WFPST). Herndon, VA, USA: IEEE. Retrieved from https://ieeexplore.ieee.org/document/10607047/ doi: https://doi.org/10.1109/WFPST58552.2024.00034
Spirtes, P., Glymour, C., & Scheines, R. (2001). Causation, Prediction, and Search. The MIT Press. Retrieved
from https://direct.mit.edu/books/monograph/2057/Causation-Prediction-and-Search doi: https://doi.org/ 10.7551/mitpress/1754.001.0001
Yu, S., He, P., Chen, N., & Wu, Y. (2023, September). Brain: Log Parsing With Bidirectional Parallel Tree. IEEE Transactions on Services Computing. Retrieved from https://ieeexplore.ieee.org/abstract/document/10109145 doi: https://doi.org/10.1109/TSC.2023.3270566
Zhang, L., Jia, T., Jia, M., Wu, Y., Liu, H., & Li, Y. (2025, February). XRAGLog: A Resource-Efficient and Context-Aware Log-Based Anomaly Detection Method Using Retrieval-Augmented Generation. In AAAI 2025 Workshop on Preventing
and Detecting LLM Misinformation (PDLM). Retrieved from https://openreview.net/forum?id=8gv7CXuXQ3
Zhao, H., Hui, J., Howland, J., Nguyen, N., Zuo, S., Hu, A., Huffman, S. (2024, June). CodeGemma: Open Code Models Based on Gemma. arXiv. Retrieved from http://arxiv.org/abs/2406.11409 doi: https://doi.org/10.48550/arXiv.2406.11409
Zhu, J., He, S., Liu, J., He, P., Xie, Q., Zheng, Z., & Lyu, M. R. (2019, January). Tools and Benchmarks for Automated Log Parsing. In International Conference on Software Engineering (ICSE). arXiv. Retrieved from http://arxiv.org/abs/1811.03509
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv. Retrieved from http://arxiv.org/abs/1810.04805 doi: https://doi.org/10.48550/arXiv.1810.04805
Ding, X., Yang, X., Hu, H., & Liu, Z. (2017,April). The safety management of urban rail transit based on operation fault log. Safety Science. Retrieved from https://www.sciencedirect.com/science/article/pii/S092575351630697X doi: https:// doi.org/10.1016/j.ssci.2016.12.015
Du, M., & Li, F. (2016, December). Spell: Streaming Parsing of System Event Logs. In 2016 IEEE 16th International Conference on Data Mining (ICDM). Retrieved from https://ieeexplore.ieee.org/document/7837916 doi: https://doi.org/
10.1109/ICDM.2016.0103
Du, M., Li, F., Zheng, G., & Srikumar, V. (2017,October). DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. In Proceedings of the 2017 ACM CCS Conference on Computer and Communications Security. New
York, NY, USA: Association for Computing Machinery. Retrieved from https://dl.acm.org/doi/10.1145/3133956.3134015 doi: https://doi.org/10.1145/3133956.3134015
European Rail Traffic Management System. (2018, September). Retrieved from https://www.era.europa.eu/domains/infrastructure/ european-rail-traffic-management-system-ertms en
Gerhards, R. (2009, March). The Syslog Protocol (Request for Comments). Internet Engineering Task Force. Retrieved from https://datatracker.ietf.org/doc/rfc5424 doi: https://doi.org/10.17487/RFC5424
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., . . . Ma, Z. (2024, November). The Llama 3 Herd of Models. arXiv. Retrieved from http://arxiv.org/abs/2407.21783 doi: https://doi.org/10.48550/arXiv.2407.21783
He, P., Zhu, J., He, S., Li, J., & Lyu, M. R. (2016,June). An Evaluation Study on Log Parsing and Its Use in Log Mining. In 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). Retrieved from https://ieeexplore.ieee.org/document/7579781 doi: https://doi.org/10.1109/DSN.2016.66
Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika,M., Song, D., & Steinhardt, J. (2021, January). Measuring Massive Multitask Language Understanding. arXiv. Retrieved from http://arxiv.org/abs/2009.03300 doi: https://doi.org/ 10.48550/arXiv.2009.03300
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang,S., Chen, W. (2021, October). LoRA: Low-Rank Adaptation of Large Language Models. arXiv.Retrieved from http://arxiv.org/abs/2106.09685 doi: https://doi.org/10.48550/arXiv.2106.09685
Huang, S., Liu, Y., Fung, C., He, R., Zhao, Y., Yang,H., & Luan, Z. (2020, December). HitAnomaly: Hierarchical Transformers for Anomaly Detection in System Log. IEEE Transactions on Network and Service Management. Retrieved from https://ieeexplore.ieee.org/abstract/document/9244088 doi: https://doi.org/10.1109/TNSM.2020.3034647
Klumpenhouwer, W., & Shalaby, A. (2022, September). Using Delay Logs and Machine Learning to Support Passenger Railway Operations. Transportation Research Record. Retrieved from https://doi.org/10.1177/03611981221085561 doi: https://doi.org/10.1177/03611981221085561
Kobayashi, S., Otomo, K., Fukuda, K., & Esaki,H. (2018, March). Mining Causality of Network Events in Log Data. IEEE Transactions on Network and Service Management. Retrieved from https://ieeexplore.ieee.org/
document/8122062 doi: https://doi.org/10.1109/TNSM.2017.2778096
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., . . . Rockt¨aschel, T. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems. Retrieved from https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html
Liu, H., Li, C., Li, Y., & Lee, Y. J. (2024, May). Improved Baselines with Visual Instruction Tuning. arXiv. Retrieved from http://arxiv.org/abs/2310.03744 doi: https://doi.org/10.48550/arXiv.2310.03744
Liu, Y., Tao, S., Meng, W., Yao, F., Zhao, X., & Yang, H. (2024, April). LogPrompt: Prompt Engineering Towards Zero-Shot and Interpretable Log Analysis. In 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). Retrieved from https://ieeexplore.ieee.org/document/10554918 doi: https://doi .org/10.1145/3639478.3643108
Mannhardt, F., & Landmark, A. D. (2019, January). Mining railway traffic control logs. Transportation Research Procedia. Retrieved from https://www.sciencedirect.com/science/article/pii/S2352146518306021 doi: https:// doi.org/10.1016/j.trpro.2018.12.187
Meng, W., Liu, Y., Zhu, Y., Zhang, S., Pei, D., Liu, Y., Zhou, R. (2019). LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs. In International Joint Conference on Artificial Intelligence. Retrieved from https://www.ijcai.org/proceedings/2019/658
Mohamad, N. H., Hashim, H., Abdul Hamid, N. A., & Ismail, M. I. A. (2021, September). Dashboard for analyzing SCADA data log: A case study of urban railway in Malaysia. International Journal of Advances in Applied Sciences. Retrieved from http://ijaas.iaescore.com/index.php/IJAAS/article/view/20480 doi: https://doi.org/10.11591/ijaas.v10.i3.pp251-260
Mudgal, P., & Wouhaybi, R. (2024). An Assessment of ChatGPT on Log Data. In F. Zhao & D. Miao (Eds.), AI-generated Content. Singapore: Springer Nature. doi: https://doi.org/10.1007/978-981-99-7587-7 13
OpenAI. (2024a, March). GPT-4 Technical Report. arXiv. Retrieved from http://arxiv.org/abs/2303
.08774 doi: https://doi.org/10.48550/arXiv.2303.08774
OpenAI. (2024b, May). Hello GPT-4o. Retrieved from https://openai.com/index/hello-gpt-4o/
OpenAI. (2024c, September). Learning to reason with LLMs. Retrieved from https://openai.com/index/learning-to-reason-with-llms/
OpenAI. (2024d, September). OpenAI o1-mini.Retrieved from https://openai.com/index/openai-o1-mini-advancing-cost-efficient-reasoning/
OpenAI. (2025, April). Introducing GPT-4.1 in the API. Retrieved from https://openai.com/index/gpt-4-1/
Pan, J., Liang, W. S., & Yidi, Y. (2024, May). RAGLog: Log Anomaly Detection using Retrieval Augmented Gen-
eration. In 2024 IEEE World Forum on Public Safety Technology (WFPST). Herndon, VA, USA: IEEE. Retrieved from https://ieeexplore.ieee.org/document/10607047/ doi: https://doi.org/10.1109/WFPST58552.2024.00034
Spirtes, P., Glymour, C., & Scheines, R. (2001). Causation, Prediction, and Search. The MIT Press. Retrieved
from https://direct.mit.edu/books/monograph/2057/Causation-Prediction-and-Search doi: https://doi.org/ 10.7551/mitpress/1754.001.0001
Yu, S., He, P., Chen, N., & Wu, Y. (2023, September). Brain: Log Parsing With Bidirectional Parallel Tree. IEEE Transactions on Services Computing. Retrieved from https://ieeexplore.ieee.org/abstract/document/10109145 doi: https://doi.org/10.1109/TSC.2023.3270566
Zhang, L., Jia, T., Jia, M., Wu, Y., Liu, H., & Li, Y. (2025, February). XRAGLog: A Resource-Efficient and Context-Aware Log-Based Anomaly Detection Method Using Retrieval-Augmented Generation. In AAAI 2025 Workshop on Preventing
and Detecting LLM Misinformation (PDLM). Retrieved from https://openreview.net/forum?id=8gv7CXuXQ3
Zhao, H., Hui, J., Howland, J., Nguyen, N., Zuo, S., Hu, A., Huffman, S. (2024, June). CodeGemma: Open Code Models Based on Gemma. arXiv. Retrieved from http://arxiv.org/abs/2406.11409 doi: https://doi.org/10.48550/arXiv.2406.11409
Zhu, J., He, S., Liu, J., He, P., Xie, Q., Zheng, Z., & Lyu, M. R. (2019, January). Tools and Benchmarks for Automated Log Parsing. In International Conference on Software Engineering (ICSE). arXiv. Retrieved from http://arxiv.org/abs/1811.03509
Section
Regular Session Papers

This work is licensed under a Creative Commons Attribution 3.0 Unported License.