Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mu Liu

Rethinking Position Embedding as a Context Controller for Multi-Reference and Multi-Shot Video Generation

Apr 04, 2026

Binyuan Huang, Yuning Lu, Weinan Jia, Hualiang Wang, Mu Liu, Daiqing Yang

Abstract:Recent proprietary models such as Sora2 demonstrate promising progress in generating multi-shot videos conditioned on multiple reference characters. However, academic research on this problem remains limited. We study this task and identify a core challenge: when reference images exhibit highly similar appearances, the model often suffers from reference confusion, where semantically similar tokens degrade the model's ability to retrieve the correct context. To address this, we introduce PoCo (Position Embedding as a Context Controller), which incorporates position encoding as additional context control beyond semantic retrieval. By employing side information of tokens, PoCo enables precise token-level matching while preserving implicit semantic consistency modeling. Building on PoCo, we develop a multi-reference and multi-shot video generation model capable of reliably controlling characters with extremely similar visual traits. Extensive experiments demonstrate that PoCo improves cross-shot consistency and reference fidelity compared with various baselines.

* Accepted to CVPR 2026

Via

Access Paper or Ask Questions

Fusion Matrix Prompt Enhanced Self-Attention Spatial-Temporal Interactive Traffic Forecasting Framework

Oct 12, 2024

Mu Liu, MingChen Sun YingJi Li, Ying Wang

Abstract:Recently, spatial-temporal forecasting technology has been rapidly developed due to the increasing demand for traffic management and travel planning. However, existing traffic forecasting models still face the following limitations. On one hand, most previous studies either focus too much on real-world geographic information, neglecting the potential traffic correlation between different regions, or overlook geographical position and only model the traffic flow relationship. On the other hand, the importance of different time slices is ignored in time modeling. Therefore, we propose a Fusion Matrix Prompt Enhanced Self-Attention Spatial-Temporal Interactive Traffic Forecasting Framework (FMPESTF), which is composed of spatial and temporal modules for down-sampling traffic data. The network is designed to establish a traffic fusion matrix considering spatial-temporal heterogeneity as a query to reconstruct a data-driven dynamic traffic data structure, which accurately reveal the flow relationship of nodes in the traffic network. In addition, we introduce attention mechanism in time modeling, and design hierarchical spatial-temporal interactive learning to help the model adapt to various traffic scenarios. Through extensive experimental on six real-world traffic datasets, our method is significantly superior to other baseline models, demonstrating its efficiency and accuracy in dealing with traffic forecasting problems.

* THE WEB CONFERENCE 2025

Via

Access Paper or Ask Questions