Abstract:Cell-free networks leverage distributed access points (APs) to achieve macro-diversity, yet their performance is often constrained by large disparities in channel quality arising from user geometry and blockages. To address this, rotatable antennas (RAs) add a lightweight hardware degree of freedom by steering the antenna boresight toward dominant propagation directions to strengthen unfavorable links, thereby enabling the network to better exploit macro-diversity for higher and more uniform performance. This paper investigates an RA-enabled cell-free downlink network and formulates a max-min rate problem that jointly optimizes transmit beamforming and antenna orientations. To tackle this challenging problem, we develop an alternating-optimization-based algorithm that iteratively updates the beamformers via a second-order cone program (SOCP) and optimizes the antenna orientations using successive convex approximation. To reduce complexity, we further propose an efficient two-stage scheme that first designs orientations by maximizing a proportional-fair log-utility using manifold-aware Frank-Wolfe updates, and then computes the beamformers using an SOCP-based design. Simulation results demonstrate that the proposed orientation-aware designs achieve a substantially higher worst-user rate than conventional beamforming-only benchmarks. Furthermore, larger antenna directivity enhances fairness with proper orientation but can degrade the worst-user performance otherwise.
Abstract:Remote sensing (RS) large vision-language models (LVLMs) have shown strong promise across visual grounding (VG) tasks. However, existing RS VG datasets predominantly rely on explicit referring expressions-such as relative position, relative size, and color cues-thereby constraining performance on implicit VG tasks that require scenario-specific domain knowledge. This article introduces DVGBench, a high-quality implicit VG benchmark for drones, covering six major application scenarios: traffic, disaster, security, sport, social activity, and productive activity. Each object provides both explicit and implicit queries. Based on the dataset, we design DroneVG-R1, an LVLM that integrates the novel Implicit-to-Explicit Chain-of-Thought (I2E-CoT) within a reinforcement learning paradigm. This enables the model to take advantage of scene-specific expertise, converting implicit references into explicit ones and thus reducing grounding difficulty. Finally, an evaluation of mainstream models on both explicit and implicit VG tasks reveals substantial limitations in their reasoning capabilities. These findings provide actionable insights for advancing the reasoning capacity of LVLMs for drone-based agents. The code and datasets will be released at https://github.com/zytx121/DVGBench
Abstract:The recently developed Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have shown encouraging and impressive results for visual SLAM. However, most representative methods require RGBD sensors and are only available for indoor environments. The robustness of reconstruction in large-scale outdoor scenarios remains unexplored. This paper introduces a large-scale 3DGS-based visual SLAM with stereo cameras, termed LSG-SLAM. The proposed LSG-SLAM employs a multi-modality strategy to estimate prior poses under large view changes. In tracking, we introduce feature-alignment warping constraints to alleviate the adverse effects of appearance similarity in rendering losses. For the scalability of large-scale scenarios, we introduce continuous Gaussian Splatting submaps to tackle unbounded scenes with limited memory. Loops are detected between GS submaps by place recognition and the relative pose between looped keyframes is optimized utilizing rendering and feature warping losses. After the global optimization of camera poses and Gaussian points, a structure refinement module enhances the reconstruction quality. With extensive evaluations on the EuRoc and KITTI datasets, LSG-SLAM achieves superior performance over existing Neural, 3DGS-based, and even traditional approaches. Project page: https://lsg-slam.github.io.