Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuwei Lu

VERTIGO: Visual Preference Optimization for Cinematic Camera Trajectory Generation

Apr 02, 2026

Mengtian Li, Yuwei Lu, Feifei Li, Chenqi Gan, Zhifeng Xie, Xi Wang

Abstract:Cinematic camera control relies on a tight feedback loop between director and cinematographer, where camera motion and framing are continuously reviewed and refined. Recent generative camera systems can produce diverse, text-conditioned trajectories, but they lack this "director in the loop" and have no explicit supervision of whether a shot is visually desirable. This results in in-distribution camera motion but poor framing, off-screen characters, and undesirable visual aesthetics. In this paper, we introduce VERTIGO, the first framework for visual preference optimization of camera trajectory generators. Our framework leverages a real-time graphics engine (Unity) to render 2D visual previews from generated camera motion. A cinematically fine-tuned vision-language model then scores these previews using our proposed cyclic semantic similarity mechanism, which aligns renders with text prompts. This process provides the visual preference signals for Direct Preference Optimization (DPO) post-training. Both quantitative evaluations and user studies on Unity renders and diffusion-based Camera-to-Video pipelines show consistent gains in condition adherence, framing quality, and perceptual realism. Notably, VERTIGO reduces the character off-screen rate from 38% to nearly 0% while preserving the geometric fidelity of camera motion. User study participants further prefer VERTIGO over baselines across composition, consistency, prompt adherence, and aesthetic quality, confirming the perceptual benefits of our visual preference post-training.

* 28 pages, 10 figures, ECCV 2026

Via

Access Paper or Ask Questions

Mind-of-Director: Multi-modal Agent-Driven Film Previsualization via Collaborative Decision-Making

Mar 16, 2026

Shufeng Nan, Mengtian Li, Sixiao Zheng, Yuwei Lu, Han Zhang, Yanwei Fu

Abstract:We present Mind-of-Director, a multi-modal agent-driven framework for film previz that models the collaborative decision-making process of a film production team. Given a creative idea, Mind-of-Director orchestrates multiple specialized agents to produce previz sequences within the game engine. The framework consists of four cooperative modules: Script Development, where agents draft and refine the screenplay iteratively; Virtual Scene Design, which transforms text into semantically aligned 3D environments; Character Behaviour Control, which determines character blocking and motion; and Camera Planning, which optimizes framing, movement, and composition for cinematic camera effects. A real-time visual editing system built in the game engine further enables interactive inspection and synchronized timeline adjustment across scenes, behaviours, and cameras. Extensive experiments and human evaluations show that Mind-of-Director generates high-quality, semantically grounded previz sequences in approximately 25 minutes per idea, demonstrating the effectiveness of agent collaboration for both automated prototyping and human-in-the-loop filmmaking.

* 10 pages, 4 figures

Via

Access Paper or Ask Questions

Forward Vehicle Collision Warning Based on Quick Camera Calibration

Apr 22, 2019

Yuwei Lu, Yuan Yuan, Qi Wang

Figure 1 for Forward Vehicle Collision Warning Based on Quick Camera Calibration

Figure 2 for Forward Vehicle Collision Warning Based on Quick Camera Calibration

Figure 3 for Forward Vehicle Collision Warning Based on Quick Camera Calibration

Figure 4 for Forward Vehicle Collision Warning Based on Quick Camera Calibration

Abstract:Forward Vehicle Collision Warning (FCW) is one of the most important functions for autonomous vehicles. In this procedure, vehicle detection and distance measurement are core components, requiring accurate localization and estimation. In this paper, we propose a simple but efficient forward vehicle collision warning framework by aggregating monocular distance measurement and precise vehicle detection. In order to obtain forward vehicle distance, a quick camera calibration method which only needs three physical points to calibrate related camera parameters is utilized. As for the forward vehicle detection, a multi-scale detection algorithm that regards the result of calibration as distance priori is proposed to improve the precision. Intensive experiments are conducted in our established real scene dataset and the results have demonstrated the effectiveness of the proposed framework.

Via

Access Paper or Ask Questions

Tracking as A Whole: Multi-Target Tracking by Modeling Group Behavior with Sequential Detection

Apr 22, 2019

Yuan Yuan, Yuwei Lu, Qi Wang

Figure 1 for Tracking as A Whole: Multi-Target Tracking by Modeling Group Behavior with Sequential Detection

Figure 2 for Tracking as A Whole: Multi-Target Tracking by Modeling Group Behavior with Sequential Detection

Figure 3 for Tracking as A Whole: Multi-Target Tracking by Modeling Group Behavior with Sequential Detection

Figure 4 for Tracking as A Whole: Multi-Target Tracking by Modeling Group Behavior with Sequential Detection

Abstract:Video-based vehicle detection and tracking is one of the most important components for Intelligent Transportation Systems (ITS). When it comes to road junctions, the problem becomes even more difficult due to the occlusions and complex interactions among vehicles. In order to get a precise detection and tracking result, in this work we propose a novel tracking-by-detection framework. In the detection stage, we present a sequential detection model to deal with serious occlusions. In the tracking stage, we model group behavior to treat complex interactions with overlaps and ambiguities. The main contributions of this paper are twofold: 1) Shape prior is exploited in the sequential detection model to tackle occlusions in crowded scene. 2) Traffic force is defined in the traffic scene to model group behavior, and it can assist to handle complex interactions among vehicles. We evaluate the proposed approach on real surveillance videos at road junctions and the performance has demonstrated the effectiveness of our method.

Via

Access Paper or Ask Questions