Abstract:Camera-conditioned video generation requires positional encoding that remains reliable under changes in camera motion, lens configuration, and scene structure. However, existing attention-level camera encodings either provide ray-only camera signals or rely on pinhole camera geometry, limiting their applicability to general camera control under the Unified Camera Model, including wide-angle and fisheye lenses. To address this limitation, we propose Curved Ray Expectation Positional Encoding (CRePE). CRePE represents each image token as a depth-aware positional distribution along its source ray, providing a Unified Camera Model-compatible positional encoding that captures the projected-path geometry induced by wide-angle and fisheye cameras. CRePE is implemented through a Geometric Attention Adapter added to frozen video DiTs, injecting token-wise scene-distance information into selected attention layers and stabilizing it with pseudo supervision from a monocular geometry foundation model. This design leads to more stable camera control and improves several geometry-aware and perceptual-quality metrics, while remaining competitive on video-quality metrics. Controlled positional-encoding ablations show a better overall average rank than a RayRoPE-style endpoint PE baseline, demonstrating the effectiveness of UCM-aware projected-path integration across diverse camera models. Furthermore, by extending the same positional-encoding pathway to external geometry control through Radial MixForcing, CRePE supports external radial-map control for scene-geometry-conditioned generation and source-video motion transfer beyond camera control.
Abstract:Recent 4D generation methods complete scene-level missing information using generative models and reconstruct the scene into radiance-based representations. However, these pipelines often present geometric inconsistencies in the generated content, and the radiance-based reconstruction requires expensive optimization. Furthermore, radiance-based representations often absorb these geometric inconsistencies into their view-dependent nature, failing to enforce the grounded geometric consistency. To address these issues, we propose Geometric 4D Stitching, an efficient framework that explicitly identifies missing geometric regions and complements them with geometrically grounded 4D stitches. As a result, our method constructs 4D scene representations in under 10 minutes on a single NVIDIA RTX 5090 GPU per one-step scene expansion, while improving geometric consistency. Moreover, we demonstrate that our explicit 4D stitching supports interative expansion of 4D mesh as well as 4D scene editing.
Abstract:As wireless networks continue to evolve, stringent latency and reliability requirements and highly dynamic channels expose fundamental limitations of gNB-centric massive multiple-input multiple-output (mMIMO) architectures, motivating a rethinking of the user equipment (UE) role. In response, the UE is transitioning from a passive transceiver into an active entity that directly contributes to system-level performance. In this context, this article examines the evolving role of the UE in mMIMO systems during the transition from fifth-generation (5G) to sixth-generation (6G), bridging third generation partnership project (3GPP) standardization, device implementation, and architectural innovation. Through a chronological review of 3GPP Releases 15 to 19, we highlight the progression of UE functionalities from basic channel state information (CSI) reporting to artificial intelligence (AI) and machine learning (ML)-based CSI enhancement and UE-initiated beam management. We further examine key implementation challenges, including multi-panel UE (MPUE) architectures, on-device intelligent processing, and energy-efficient operation, and then discuss corresponding architectural innovations under practical constraints. Using digital-twin-based evaluations, we validate the impact of emerging UE-centric functionalities, illustrating that UE-initiated beam reporting improves throughput in realistic mobility scenarios, while a multi-panel architecture enhances link robustness compared with a single-panel UE.