Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hari Madhukumar

Multi-Agentic AI for Fairness-Aware and Accelerated Multi-modal Large Model Inference in Real-world Mobile Edge Networks

Feb 06, 2026

Haiyuan Li, Hari Madhukumar, Shuangyi Yan, Yulei Wu, Dimitra Simeonidou

Abstract:Generative AI (GenAI) has transformed applications in natural language processing and content creation, yet centralized inference remains hindered by high latency, limited customizability, and privacy concerns. Deploying large models (LMs) in mobile edge networks emerges as a promising solution. However, it also poses new challenges, including heterogeneous multi-modal LMs with diverse resource demands and inference speeds, varied prompt/output modalities that complicate orchestration, and resource-limited infrastructure ill-suited for concurrent LM execution. In response, we propose a Multi-Agentic AI framework for latency- and fairness-aware multi-modal LM inference in mobile edge networks. Our solution includes a long-term planning agent, a short-term prompt scheduling agent, and multiple on-node LM deployment agents, all powered by foundation language models. These agents cooperatively optimize prompt routing and LM deployment through natural language reasoning over runtime telemetry and historical experience. To evaluate its performance, we further develop a city-wide testbed that supports network monitoring, containerized LM deployment, intra-server resource management, and inter-server communications. Experiments demonstrate that our solution reduces average latency by over 80% and improves fairness (Normalized Jain index) to 0.90 compared to other baselines. Moreover, our solution adapts quickly without fine-tuning, offering a generalizable solution for optimizing GenAI services in edge environments.

Via

Access Paper or Ask Questions

Profiling AI Models: Towards Efficient Computation Offloading in Heterogeneous Edge AI Systems

Oct 30, 2024

Juan Marcelo Parra-Ullauri, Oscar Dilley, Hari Madhukumar, Dimitra Simeonidou

Figure 1 for Profiling AI Models: Towards Efficient Computation Offloading in Heterogeneous Edge AI Systems

Figure 2 for Profiling AI Models: Towards Efficient Computation Offloading in Heterogeneous Edge AI Systems

Figure 3 for Profiling AI Models: Towards Efficient Computation Offloading in Heterogeneous Edge AI Systems

Figure 4 for Profiling AI Models: Towards Efficient Computation Offloading in Heterogeneous Edge AI Systems

Abstract:The rapid growth of end-user AI applications, such as computer vision and generative AI, has led to immense data and processing demands often exceeding user devices' capabilities. Edge AI addresses this by offloading computation to the network edge, crucial for future services in 6G networks. However, it faces challenges such as limited resources during simultaneous offloads and the unrealistic assumption of homogeneous system architecture. To address these, we propose a research roadmap focused on profiling AI models, capturing data about model types, hyperparameters, and underlying hardware to predict resource utilisation and task completion time. Initial experiments with over 3,000 runs show promise in optimising resource allocation and enhancing Edge AI performance.

Via

Access Paper or Ask Questions

From Hype to Reality: The Road Ahead of Deploying DRL in 6G Networks

Oct 30, 2024

Haiyuan Li, Hari Madhukumar, Peizheng Li, Yiran Teng, Shuangyi Yan, Dimitra Simeonidou

Figure 1 for From Hype to Reality: The Road Ahead of Deploying DRL in 6G Networks

Figure 2 for From Hype to Reality: The Road Ahead of Deploying DRL in 6G Networks

Figure 3 for From Hype to Reality: The Road Ahead of Deploying DRL in 6G Networks

Figure 4 for From Hype to Reality: The Road Ahead of Deploying DRL in 6G Networks

Abstract:The industrial landscape is rapidly evolving with the advent of 6G applications, which demand massive connectivity, high computational capacity, and ultra-low latency. These requirements present new challenges, which can no longer be efficiently addressed by conventional strategies. In response, this article underscores the transformative potential of Deep Reinforcement Learning (DRL) for 6G, highlighting its advantages over classic machine learning solutions in meeting the demands of 6G. The necessity of DRL is further validated through three DRL applications in an end-to-end communication procedure, including wireless access control, baseband function placement, and network slicing coordination. However, DRL-based network management initiatives are far from mature. We extend the discussion to identify the challenges of applying DRL in practical networks and explore potential solutions along with their respective limitations. In the end, these insights are validated through a practical DRL deployment in managing network slices on the testbed.

Via

Access Paper or Ask Questions