A Hierarchical Agentic Framework for Autonomous Drone-Based Visual Inspection
##plugins.themes.bootstrap3.article.main##
##plugins.themes.bootstrap3.article.sidebar##
Abstract
Autonomous inspection systems are essential for ensuring the performance and longevity of industrial assets. Recently, agentic frameworks have demonstrated significant potential for automating inspection workflows but have been limited to digital tasks. Their application to physical assets in real-world environments, however, remains underexplored. In this work, we propose a hierarchical agentic framework for autonomous drone guidance, focusing on visual inspection tasks in indoor industrial settings, such as interpreting industrial readouts or inspecting equipment. Our framework employs a multi-agent system comprising a head agent and multiple worker agents, each controlling a single drone. The head agent performs high-level planning and evaluates outcomes, while the worker agents reason over and execute low-level actions. Operating entirely in the natural language space, the framework follows a plan, reason, act, evaluate cycle, enabling drones to handle tasks ranging from simple navigation (e.g., flying forward 10 meters and land) to complex high-level task (e.g., locating and reading a pressure gauge). The head agent’s evaluation phase serves as a feedback and/or replanning stage, ensuring the actions executed align with user objectives while preventing undesirable outcomes. We evaluate the framework in a simulated environment with two worker agents, assessing performance qualitatively and quantitatively based on task completion across varying levels of task complexity and agentic workflow efficiency. By leveraging natural language processing for agent communication, our approach offers a novel, flexible, and user-accessible alternative to traditional drone-based solutions, enabling a more autonomous problem-solving approach to industrial inspection tasks without requiring extensive user intervention.
How to Cite
##plugins.themes.bootstrap3.article.details##
artificial intelligence, agentic sytems, large language models
Elrefaie, M., Qian, J., Wu, R., Chen, Q., Dai, A., & Ahmed, F. (2025). AI agents in engineering design: A multi-agent framework for aesthetic and aerodynamic car design. https://arxiv.org/abs/2503.23315
Gridach, M., Nanavati, J., Abidine, K. Z. E., Mendes, L., & Mack, C. (2025). Agentic AI for scientific discovery: A survey of progress, challenges, and future directions. arXiv preprint arXiv:2503.08979.
Huang, Y., Chen, Y., Zhang, H., Li, K., Fang, M., Yang, L., & others. (2025). Deep research agents: A systematic examination and roadmap. arXiv preprint arXiv:2506.18096.
Javaid, S., Saeed, N., & He, B. (2024). Large language models for UAVs: Current state and pathways to the future. https://arxiv.org/abs/2405.01745
Pandey, S., Xu, R., Wang, W., & Chu, X. (2025). OpenFOAMGPT: A RAG-augmented LLM agent for OpenFOAM-based computational fluid dynamics. https://arxiv.org/abs/2501.06327
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
Rodríguez, D. A., Lozano Tafur, C., Melo Daza, P. F., Villalba Vidales, J. A., & Daza Rincón, J. C. (2024). Inspection of aircrafts and airports using UAS: A review. Results in Engineering, 22, 102330. https://doi.org/10.1016/j.rineng.2024.102330
Sapkota, R., Roumeliotis, K. I., & Karkee, M. (2025). UAVs meet agentic AI: A multidomain survey of autonomous aerial intelligence and agentic UAVs. https://arxiv.org/abs/2506.08045
Slasky, A. (2025). 15 best AI coding assistant tools in 2025. https://www.qodo.ai/blog/best-ai-coding-assistant-tools/
Tian, Y., Lin, F., Li, Y., Zhang, T., Zhang, Q., Fu, X., & Wang, F. Y. (2025). UAVs meet LLMs: Overviews and perspectives towards agentic low-altitude mobility. Information Fusion, 122, 103158. https://doi.org/10.1016/j.inffus.2025.103158
U.S. Department of Labor, O. S., & Administration, H. (2025). Commonly used statistics. https://www.osha.gov/data/commonstats
Wang, J., Shi, E., Hu, H., Ma, C., Liu, Y., Wang, X., & Zhang, S. (2024). Large language models for robotics: Opportunities, challenges, and perspectives. Journal of Automation and Intelligence.
Wang, W., Li, Y., Jiao, L., & Yuan, J. (2025). GSCE: A prompt framework with enhanced reasoning for reliable LLM-driven drone control. https://arxiv.org/abs/2502.12531
Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., & others. (2023). Autogen: Enabling next-gen LLM applications via multi-agent conversation. arXiv preprint arXiv:2308.08155.
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). ReAct: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.
Zhang, S., & Lu, Q. (2024). Bridging intelligence and instinct: A new control paradigm for autonomous robots. https://arxiv.org/abs/2307.10690
Zou, Y., Cheng, A. H., Aldossary, A., Bai, J., Leong, S. X., Campos-Gonzalez-Angulo, J. A., & Aspuru-Guzik, A. (2025). El Agente: An autonomous agent for quantum chemistry. https://arxiv.org/abs/2505.02484

This work is licensed under a Creative Commons Attribution 3.0 Unported License.
The Prognostic and Health Management Society advocates open-access to scientific data and uses a Creative Commons license for publishing and distributing any papers. A Creative Commons license does not relinquish the author’s copyright; rather it allows them to share some of their rights with any member of the public under certain conditions whilst enjoying full legal protection. By submitting an article to the International Conference of the Prognostics and Health Management Society, the authors agree to be bound by the associated terms and conditions including the following:
As the author, you retain the copyright to your Work. By submitting your Work, you are granting anybody the right to copy, distribute and transmit your Work and to adapt your Work with proper attribution under the terms of the Creative Commons Attribution 3.0 United States license. You assign rights to the Prognostics and Health Management Society to publish and disseminate your Work through electronic and print media if it is accepted for publication. A license note citing the Creative Commons Attribution 3.0 United States License as shown below needs to be placed in the footnote on the first page of the article.
First Author et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.