Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Filippo Vannella

Dual-Graph Multi-Agent Reinforcement Learning for Handover Optimization

Mar 25, 2026

Matteo Salvatori, Filippo Vannella, Sebastian Macaluso, Stylianos E. Trevlakis, Carlos Segura Perales, José Suarez-Varela, Alexandros-Apostolos A. Boulogeorgos, Ioannis Arapakis

Abstract:HandOver (HO) control in cellular networks is governed by a set of HO control parameters that are traditionally configured through rule-based heuristics. A key parameter for HO optimization is the Cell Individual Offset (CIO), defined for each pair of neighboring cells and used to bias HO triggering decisions. At network scale, tuning CIOs becomes a tightly coupled problem: small changes can redirect mobility flows across multiple neighbors, and static rules often degrade under non-stationary traffic and mobility. We exploit the pairwise structure of CIOs by formulating HO optimization as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) on the network's dual graph. In this representation, each agent controls a neighbor-pair CIO and observes Key Performance Indicators (KPIs) aggregated over its local dual-graph neighborhood, enabling scalable decentralized decisions while preserving graph locality. Building on this formulation, we propose TD3-D-MA, a discrete Multi-Agent Reinforcement Learning (MARL) variant of the TD3 algorithm with a shared-parameter Graph Neural Network (GNN) actor operating on the dual graph and region-wise double critics for training, improving credit assignment in dense deployments. We evaluate TD3-D-MA in an ns-3 system-level simulator configured with real-world network operator parameters across heterogeneous traffic regimes and network topologies. Results show that TD3-D-MA improves network throughput over standard HO heuristics and centralized RL baselines, and generalizes robustly under topology and traffic shifts.

Via

Access Paper or Ask Questions

Fair Best Arm Identification with Fixed Confidence

Aug 30, 2024

Alessio Russo, Filippo Vannella

Figure 1 for Fair Best Arm Identification with Fixed Confidence

Figure 2 for Fair Best Arm Identification with Fixed Confidence

Figure 3 for Fair Best Arm Identification with Fixed Confidence

Figure 4 for Fair Best Arm Identification with Fixed Confidence

Abstract:In this work, we present a novel framework for Best Arm Identification (BAI) under fairness constraints, a setting that we refer to as \textit{F-BAI} (fair BAI). Unlike traditional BAI, which solely focuses on identifying the optimal arm with minimal sample complexity, F-BAI also includes a set of fairness constraints. These constraints impose a lower limit on the selection rate of each arm and can be either model-agnostic or model-dependent. For this setting, we establish an instance-specific sample complexity lower bound and analyze the \textit{price of fairness}, quantifying how fairness impacts sample complexity. Based on the sample complexity lower bound, we propose F-TaS, an algorithm provably matching the sample complexity lower bound, while ensuring that the fairness constraints are satisfied. Numerical results, conducted using both a synthetic model and a practical wireless scheduling application, show the efficiency of F-TaS in minimizing the sample complexity while achieving low fairness violations.

Via

Access Paper or Ask Questions

Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach

Jan 06, 2022

Filippo Vannella, Alexandre Proutiere, Yassir Jedra, Jaeseong Jeong

Figure 1 for Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach

Figure 2 for Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach

Figure 3 for Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach

Figure 4 for Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach

Abstract:Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity. In this paper, we devise algorithms learning optimal tilt control policies from existing data (in the so-called passive learning setting) or from data actively generated by the algorithms (the active learning setting). We formalize the design of such algorithms as a Best Policy Identification (BPI) problem in Contextual Linear Multi-Arm Bandits (CL-MAB). An arm represents an antenna tilt update; the context captures current network conditions; the reward corresponds to an improvement of performance, mixing coverage and capacity; and the objective is to identify, with a given level of confidence, an approximately optimal policy (a function mapping the context to an arm with maximal reward). For CL-MAB in both active and passive learning settings, we derive information-theoretical lower bounds on the number of samples required by any algorithm returning an approximately optimal policy with a given level of certainty, and devise algorithms achieving these fundamental limits. We apply our algorithms to the Remote Electrical Tilt (RET) optimization problem in cellular networks, and show that they can produce optimal tilt update policy using much fewer data samples than naive or existing rule-based learning algorithms.

Via

Access Paper or Ask Questions

A Graph Attention Learning Approach to Antenna Tilt Optimization

Dec 27, 2021

Yifei Jin, Filippo Vannella, Maxime Bouton, Jaeseong Jeong, Ezeddin Al Hakim

Figure 1 for A Graph Attention Learning Approach to Antenna Tilt Optimization

Figure 2 for A Graph Attention Learning Approach to Antenna Tilt Optimization

Figure 3 for A Graph Attention Learning Approach to Antenna Tilt Optimization

Figure 4 for A Graph Attention Learning Approach to Antenna Tilt Optimization

Abstract:6G will move mobile networks towards increasing levels of complexity. To deal with this complexity, optimization of network parameters is key to ensure high performance and timely adaptivity to dynamic network environments. The optimization of the antenna tilt provides a practical and cost-efficient method to improve coverage and capacity in the network. Previous methods based on Reinforcement Learning (RL) have shown great promise for tilt optimization by learning adaptive policies outperforming traditional tilt optimization methods. However, most existing RL methods are based on single-cell features representation, which fails to fully characterize the agent state, resulting in suboptimal performance. Also, most of such methods lack scalability, due to state-action explosion, and generalization ability. In this paper, we propose a Graph Attention Q-learning (GAQ) algorithm for tilt optimization. GAQ relies on a graph attention mechanism to select relevant neighbors information, improve the agent state representation, and update the tilt control policy based on a history of observations using a Deep Q-Network (DQN). We show that GAQ efficiently captures important network information and outperforms standard DQN with local information by a large margin. In addition, we demonstrate its ability to generalize to network deployments of different sizes and densities.

Via

Access Paper or Ask Questions

Safe Reinforcement Learning for Antenna Tilt Optimisation using Shielding and Multiple Baselines

Dec 02, 2020

Saman Feghhi, Erik Aumayr, Filippo Vannella, Ezeddin Al Hakim, Grigorios Iakovidis

Figure 1 for Safe Reinforcement Learning for Antenna Tilt Optimisation using Shielding and Multiple Baselines

Figure 2 for Safe Reinforcement Learning for Antenna Tilt Optimisation using Shielding and Multiple Baselines

Figure 3 for Safe Reinforcement Learning for Antenna Tilt Optimisation using Shielding and Multiple Baselines

Figure 4 for Safe Reinforcement Learning for Antenna Tilt Optimisation using Shielding and Multiple Baselines

Abstract:Safe interaction with the environment is one of the most challenging aspects of Reinforcement Learning (RL) when applied to real-world problems. This is particularly important when unsafe actions have a high or irreversible negative impact on the environment. In the context of network management operations, Remote Electrical Tilt (RET) optimisation is a safety-critical application in which exploratory modifications of antenna tilt angles of Base Stations (BSs) can cause significant performance degradation in the network. In this paper, we propose a modular Safe Reinforcement Learning (SRL) architecture which is then used to address the RET optimisation in cellular networks. In this approach, a safety shield continuously benchmarks the performance of RL agents against safe baselines, and determines safe antenna tilt updates to be performed on the network. Our results demonstrate improved performance of the SRL agent over the baseline while ensuring the safety of the performed actions.

Via

Access Paper or Ask Questions

Remote Electrical Tilt Optimization via Safe Reinforcement Learning

Oct 12, 2020

Filippo Vannella, Grigorios Iakovidis, Ezeddin Al Hakim, Erik Aumayr, Saman Feghhi

Figure 1 for Remote Electrical Tilt Optimization via Safe Reinforcement Learning

Figure 2 for Remote Electrical Tilt Optimization via Safe Reinforcement Learning

Figure 3 for Remote Electrical Tilt Optimization via Safe Reinforcement Learning

Figure 4 for Remote Electrical Tilt Optimization via Safe Reinforcement Learning

Abstract:Remote Electrical Tilt (RET) optimization is an efficient method for adjusting the vertical tilt angle of Base Stations (BSs) antennas in order to optimize Key Performance Indicators (KPIs) of the network. Reinforcement Learning (RL) provides a powerful framework for RET optimization because of its self-learning capabilities and adaptivity to environmental changes. However, an RL agent may execute unsafe actions during the course of its interaction, i.e., actions resulting in undesired network performance degradation. Since the reliability of services is critical for Mobile Network Operators (MNOs), the prospect of performance degradation has prohibited the real-world deployment of RL methods for RET optimization. In this work, we model the RET optimization problem in the Safe Reinforcement Learning (SRL) framework with the goal of learning a tilt control strategy providing performance improvement guarantees with respect to a safe baseline. We leverage a recent SRL method, namely Safe Policy Improvement through Baseline Bootstrapping (SPIBB), to learn an improved policy from an offline dataset of interactions collected by the safe baseline. Our experiments show that the proposed approach is able to learn a safe and improved tilt update policy, providing a higher degree of reliability and potential for real-world network deployment.

Via

Access Paper or Ask Questions

Off-policy Learning for Remote Electrical Tilt Optimization

May 21, 2020

Filippo Vannella, Jaeseong Jeong, Alexandre Proutiere

Figure 1 for Off-policy Learning for Remote Electrical Tilt Optimization

Figure 2 for Off-policy Learning for Remote Electrical Tilt Optimization

Figure 3 for Off-policy Learning for Remote Electrical Tilt Optimization

Figure 4 for Off-policy Learning for Remote Electrical Tilt Optimization

Abstract:We address the problem of Remote Electrical Tilt (RET) optimization using off-policy Contextual Multi-Armed-Bandit (CMAB) techniques. The goal in RET optimization is to control the orientation of the vertical tilt angle of the antenna to optimize Key Performance Indicators (KPIs) representing the Quality of Service (QoS) perceived by the users in cellular networks. Learning an improved tilt update policy is hard. On the one hand, coming up with a new policy in an online manner in a real network requires exploring tilt updates that have never been used before, and is operationally too risky. On the other hand, devising this policy via simulations suffers from the simulation-to-reality gap. In this paper, we circumvent these issues by learning an improved policy in an offline manner using existing data collected on real networks. We formulate the problem of devising such a policy using the off-policy CMAB framework. We propose CMAB learning algorithms to extract optimal tilt update policies from the data. We train and evaluate these policies on real-world 4G Long Term Evolution (LTE) cellular network data. Our policies show consistent improvements over the rule-based logging policy used to collect the data.

Via

Access Paper or Ask Questions