Abstract:Multimodal Large Language Models (MLLMs) are a major focus of recent AI research. However, most prior work focuses on static image understanding, while their ability to process sequential audio-video data remains underexplored. This gap highlights the need for a high-quality benchmark to systematically evaluate MLLM performance in a real-world setting. We introduce SONIC-O1, a comprehensive, fully human-verified benchmark spanning 13 real-world conversational domains with 4,958 annotations and demographic metadata. SONIC-O1 evaluates MLLMs on key tasks, including open-ended summarization, multiple-choice question (MCQ) answering, and temporal localization with supporting rationales (reasoning). Experiments on closed- and open-source models reveal limitations. While the performance gap in MCQ accuracy between two model families is relatively small, we observe a substantial 22.6% performance difference in temporal localization between the best performing closed-source and open-source models. Performance further degrades across demographic groups, indicating persistent disparities in model behavior. Overall, SONIC-O1 provides an open evaluation suite for temporally grounded and socially robust multimodal understanding. We release SONIC-O1 for reproducibility and research: Project page: https://vectorinstitute.github.io/sonic-o1/ Dataset: https://huggingface.co/datasets/vector-institute/sonic-o1 Github: https://github.com/vectorinstitute/sonic-o1 Leaderboard: https://huggingface.co/spaces/vector-institute/sonic-o1-leaderboard
Abstract:The rapid growth in wireless infrastructure has increased the need to accurately estimate and forecast electromagnetic field (EMF) levels to ensure ongoing compliance, assess potential health impacts, and support efficient network planning. While existing studies rely on univariate forecasting of wideband aggregate EMF data, frequency-selective multivariate forecasting is needed to capture the inter-operator and inter-frequency variations essential for proactive network planning. To this end, this paper introduces EMFusion, a conditional multivariate diffusion-based probabilistic forecasting framework that integrates diverse contextual factors (e.g., time of day, season, and holidays) while providing explicit uncertainty estimates. The proposed architecture features a residual U-Net backbone enhanced by a cross-attention mechanism that dynamically integrates external conditions to guide the generation process. Furthermore, EMFusion integrates an imputation-based sampling strategy that treats forecasting as a structural inpainting task, ensuring temporal coherence even with irregular measurements. Unlike standard point forecasters, EMFusion generates calibrated probabilistic prediction intervals directly from the learned conditional distribution, providing explicit uncertainty quantification essential for trustworthy decision-making. Numerical experiments conducted on frequency-selective EMF datasets demonstrate that EMFusion with the contextual information of working hours outperforms the baseline models with or without conditions. The EMFusion outperforms the best baseline by 23.85% in continuous ranked probability score (CRPS), 13.93% in normalized root mean square error, and reduces prediction CRPS error by 22.47%.




Abstract:Unmanned aerial vehicles (UAVs) have been widely adopted in various real-world applications. However, the control and optimization of multi-UAV systems remain a significant challenge, particularly in dynamic and constrained environments. This work explores the joint motion and communication control of multiple UAVs operating within integrated terrestrial and non-terrestrial networks that include high-altitude platform stations (HAPS). Specifically, we consider an aerial highway scenario in which UAVs must accelerate, decelerate, and change lanes to avoid collisions and maintain overall traffic flow. Different from existing studies, we propose a novel hierarchical and collaborative method based on large language models (LLMs). In our approach, an LLM deployed on the HAPS performs UAV access control, while another LLM onboard each UAV handles motion planning and control. This LLM-based framework leverages the rich knowledge embedded in pre-trained models to enable both high-level strategic planning and low-level tactical decisions. This knowledge-driven paradigm holds great potential for the development of next-generation 3D aerial highway systems. Experimental results demonstrate that our proposed collaborative LLM-based method achieves higher system rewards, lower operational costs, and significantly reduced UAV collision rates compared to baseline approaches.
Abstract:With the recent advancements in wireless technologies, forecasting electromagnetic field (EMF) exposure has become critical to enable proactive network spectrum and power allocation, as well as network deployment planning. In this paper, we develop a deep learning (DL) time series forecasting framework referred to as \textit{EMForecaster}. The proposed DL architecture employs patching to process temporal patterns at multiple scales, complemented by reversible instance normalization and mixing operations along both temporal and patch dimensions for efficient feature extraction. We augment {EMForecaster} with a conformal prediction mechanism, which is independent of the data distribution, to enhance the trustworthiness of model predictions via uncertainty quantification of forecasts. This conformal prediction mechanism ensures that the ground truth lies within a prediction interval with target error rate $\alpha$, where $1-\alpha$ is referred to as coverage. However, a trade-off exists, as increasing coverage often results in wider prediction intervals. To address this challenge, we propose a new metric called the \textit{Trade-off Score}, that balances trustworthiness of the forecast (i.e., coverage) and the width of prediction interval. Our experiments demonstrate that EMForecaster achieves superior performance across diverse EMF datasets, spanning both short-term and long-term prediction horizons. In point forecasting tasks, EMForecaster substantially outperforms current state-of-the-art DL approaches, showing improvements of 53.97\% over the Transformer architecture and 38.44\% over the average of all baseline models. EMForecaster also exhibits an excellent balance between prediction interval width and coverage in conformal forecasting, measured by the tradeoff score, showing marked improvements of 24.73\% over the average baseline and 49.17\% over the Transformer architecture.
Abstract:This paper proposes a novel framework for real-time adaptive-bitrate video streaming by integrating latent diffusion models (LDMs) within the FFmpeg techniques. This solution addresses the challenges of high bandwidth usage, storage inefficiencies, and quality of experience (QoE) degradation associated with traditional constant bitrate streaming (CBS) and adaptive bitrate streaming (ABS). The proposed approach leverages LDMs to compress I-frames into a latent space, offering significant storage and semantic transmission savings without sacrificing high visual quality. While it keeps B-frames and P-frames as adjustment metadata to ensure efficient video reconstruction at the user side, the proposed framework is complemented with the most state-of-the-art denoising and video frame interpolation (VFI) techniques. These techniques mitigate semantic ambiguity and restore temporal coherence between frames, even in noisy wireless communication environments. Experimental results demonstrate the proposed method achieves high-quality video streaming with optimized bandwidth usage, outperforming state-of-the-art solutions in terms of QoE and resource efficiency. This work opens new possibilities for scalable real-time video streaming in 5G and future post-5G networks.
Abstract:Efficient resource allocation is essential for optimizing various tasks in wireless networks, which are usually formulated as generalized assignment problems (GAP). GAP, as a generalized version of the linear sum assignment problem, involves both equality and inequality constraints that add computational challenges. In this work, we present a novel Conditional Value at Risk (CVaR)-based Variational Quantum Eigensolver (VQE) framework to address GAP in vehicular networks (VNets). Our approach leverages a hybrid quantum-classical structure, integrating a tailored cost function that balances both objective and constraint-specific penalties to improve solution quality and stability. Using the CVaR-VQE model, we handle the GAP efficiently by focusing optimization on the lower tail of the solution space, enhancing both convergence and resilience on noisy intermediate-scale quantum (NISQ) devices. We apply this framework to a user-association problem in VNets, where our method achieves 23.5% improvement compared to the deep neural network (DNN) approach.




Abstract:Employing wireless systems with dual sensing and communications functionalities is becoming critical in next generation of wireless networks. In this paper, we propose a robust design for over-the-air federated edge learning (OTA-FEEL) that leverages sensing capabilities at the parameter server (PS) to mitigate the impact of target echoes on the analog model aggregation. We first derive novel expressions for the Cramer-Rao bound of the target response and mean squared error (MSE) of the estimated global model to measure radar sensing and model aggregation quality, respectively. Then, we develop a joint scheduling and beamforming framework that optimizes the OTA-FEEL performance while keeping the sensing and communication quality, determined respectively in terms of Cramer-Rao bound and achievable downlink rate, in a desired range. The resulting scheduling problem reduces to a combinatorial mixed-integer nonlinear programming problem (MINLP). We develop a low-complexity hierarchical method based on the matching pursuit algorithm used widely for sparse recovery in the literature of compressed sensing. The proposed algorithm uses a step-wise strategy to omit the least effective devices in each iteration based on a metric that captures both the aggregation and sensing quality of the system. It further invokes alternating optimization scheme to iteratively update the downlink beamforming and uplink post-processing by marginally optimizing them in each iteration. Convergence and complexity analysis of the proposed algorithm is presented. Numerical evaluations on MNIST and CIFAR-10 datasets demonstrate the effectiveness of our proposed algorithm. The results show that by leveraging accurate sensing, the target echoes on the uplink signal can be effectively suppressed, ensuring the quality of model aggregation to remain intact despite the interference.



Abstract:The emergence of optical intelligent reflecting surface (OIRS) technologies marks a milestone in optical wireless communication (OWC) systems, enabling enhanced control over light propagation in indoor environments. This capability allows for the customization of channel conditions to achieve specific performance goals. This paper presents an enhancement in downlink cell-free OWC networks through the integration of OIRS. The key focus is on fine-tuning crucial parameters, including transmit power, receiver orientations, OIRS elements allocation, and strategic placement. In particular, a multi-objective optimization problem (MOOP) aimed at simultaneously improving the network's spectral efficiency (SE) and energy efficiency (EE) while adhering to the network's quality of service (QoS) constraints is formulated. The problem is solved by employing the $\epsilon$-constraint method to convert the MOOP into a single-objective optimization problem and solving it through successive convex approximation. Simulation results show the significant impact of OIRS on SE and EE, confirming its effectiveness in improving OWC network performance.




Abstract:Revolutionary sixth-generation wireless communications technologies and applications, notably digital twin networks (DTN), connected autonomous vehicles (CAVs), space-air-ground integrated networks (SAGINs), zero-touch networks, industry 5.0, and healthcare 5.0, are driving next-generation wireless networks (NGWNs). These technologies generate massive data, requiring swift transmission and trillions of device connections, fueling the need for sophisticated next-generation multiple access (NGMA) schemes. NGMA enables massive connectivity in the 6G era, optimizing NGWN operations beyond current multiple access (MA) schemes. This survey showcases non-orthogonal multiple access (NOMA) as NGMA's frontrunner, exploring What has NOMA delivered?, What is NOMA providing?, and What lies ahead?. We present NOMA variants, fundamental operations, and applicability in multi-antenna systems, machine learning, reconfigurable intelligent surfaces (RIS), cognitive radio networks (CRN), integrated sensing and communications (ISAC), terahertz networks, and unmanned aerial vehicles (UAVs). Additionally, we explore NOMA's interplay with state-of-the-art wireless technologies, highlighting its advantages and technical challenges. Finally, we unveil NOMA research trends in the 6G era and provide design recommendations and future perspectives for NOMA as the leading NGMA solution for NGWNs.
Abstract:Large language models (LLMs) have received considerable interest recently due to their outstanding reasoning and comprehension capabilities. This work explores applying LLMs to vehicular networks, aiming to jointly optimize vehicle-to-infrastructure (V2I) communications and autonomous driving (AD) policies. We deploy LLMs for AD decision-making to maximize traffic flow and avoid collisions for road safety, and a double deep Q-learning algorithm (DDQN) is used for V2I optimization to maximize the received data rate and reduce frequent handovers. In particular, for LLM-enabled AD, we employ the Euclidean distance to identify previously explored AD experiences, and then LLMs can learn from past good and bad decisions for further improvement. Then, LLM-based AD decisions will become part of states in V2I problems, and DDQN will optimize the V2I decisions accordingly. After that, the AD and V2I decisions are iteratively optimized until convergence. Such an iterative optimization approach can better explore the interactions between LLMs and conventional reinforcement learning techniques, revealing the potential of using LLMs for network optimization and management. Finally, the simulations demonstrate that our proposed hybrid LLM-DDQN approach outperforms the conventional DDQN algorithm, showing faster convergence and higher average rewards.