Abstract:Immersive applications such as virtual and augmented reality impose stringent requirements on frame rate, latency, and synchronization between physical and virtual environments. To meet these requirements, an edge server must render panoramic content, predict user head motion, and transmit a portion of the scene that is large enough to cover the user viewport while remaining within wireless bandwidth constraints. Each portion produces two feedback signals: prediction feedback, indicating whether the selected portion covers the actual viewport, and transmission feedback, indicating whether the corresponding packets are successfully delivered. Prior work models this problem as a multi-armed bandit with two-level bandit feedback, but fails to exploit the fact that prediction feedback can be retrospectively computed for all candidate portions once the user head pose is observed. As a result, prediction feedback constitutes full-information feedback rather than bandit feedback. Motivated by this observation, we introduce a two-level hybrid feedback model that combines full-information and bandit feedback, and formulate the portion selection problem as an online learning task under this setting. We derive an instance-dependent regret lower bound for the hybrid feedback model and propose AdaPort, a hybrid learning algorithm that leverages both feedback types to improve learning efficiency. We further establish an instance-dependent regret upper bound that matches the lower bound asymptotically, and demonstrate through real-world trace driven simulations that AdaPort consistently outperforms state-of-the-art baseline methods.
Abstract:The constrained combinatorial multi-armed bandit model has been widely employed to solve problems in wireless networking and related areas, including the problem of wireless scheduling for throughput optimization under unknown channel conditions. Most work in this area uses an algorithm design strategy that combines a bandit learning algorithm with the virtual queue technique to track the throughput constraint violation. These algorithms seek to minimize the virtual queue length in their algorithm design. However, in networks where channel conditions change abruptly, the resulting constraints may become infeasible, leading to unbounded growth in virtual queue lengths. In this paper, we make the key observation that the dynamics of the head-of-line age, i.e. the age of the oldest packet in the virtual queue, make it more robust when used in algorithm design compared to the virtual queue length. We therefore design a learning-based scheduling policy that uses the head-of-line age in place of the virtual queue length. We show that our policy matches state-of-the-art performance under i.i.d. network conditions. Crucially, we also show that the system remains stable even under abrupt changes in channel conditions and can rapidly recover from periods of constraint infeasibility.