Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guangzhao Li

FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing

Jun 05, 2025

Guangzhao Li, Yanming Yang, Chenxi Song, Chi Zhang

Abstract:Text-driven video editing aims to modify video content according to natural language instructions. While recent training-free approaches have made progress by leveraging pre-trained diffusion models, they typically rely on inversion-based techniques that map input videos into the latent space, which often leads to temporal inconsistencies and degraded structural fidelity. To address this, we propose FlowDirector, a novel inversion-free video editing framework. Our framework models the editing process as a direct evolution in data space, guiding the video via an Ordinary Differential Equation (ODE) to smoothly transition along its inherent spatiotemporal manifold, thereby preserving temporal coherence and structural details. To achieve localized and controllable edits, we introduce an attention-guided masking mechanism that modulates the ODE velocity field, preserving non-target regions both spatially and temporally. Furthermore, to address incomplete edits and enhance semantic alignment with editing instructions, we present a guidance-enhanced editing strategy inspired by Classifier-Free Guidance, which leverages differential signals between multiple candidate flows to steer the editing trajectory toward stronger semantic alignment without compromising structural consistency. Extensive experiments across benchmarks demonstrate that FlowDirector achieves state-of-the-art performance in instruction adherence, temporal consistency, and background preservation, establishing a new paradigm for efficient and coherent video editing without inversion.

* Project Page is https://flowdirector-edit.github.io

Via

Access Paper or Ask Questions

Dynamic Dense RGB-D SLAM using Learning-based Visual Odometry

May 12, 2022

Shihao Shen, Yilin Cai, Jiayi Qiu, Guangzhao Li

Figure 1 for Dynamic Dense RGB-D SLAM using Learning-based Visual Odometry

Figure 2 for Dynamic Dense RGB-D SLAM using Learning-based Visual Odometry

Figure 3 for Dynamic Dense RGB-D SLAM using Learning-based Visual Odometry

Figure 4 for Dynamic Dense RGB-D SLAM using Learning-based Visual Odometry

Abstract:We propose a dense dynamic RGB-D SLAM pipeline based on a learning-based visual odometry, TartanVO. TartanVO, like other direct methods rather than feature-based, estimates camera pose through dense optical flow, which only applies to static scenes and disregards dynamic objects. Due to the color constancy assumption, optical flow is not able to differentiate between dynamic and static pixels. Therefore, to reconstruct a static map through such direct methods, our pipeline resolves dynamic/static segmentation by leveraging the optical flow output, and only fuse static points into the map. Moreover, we rerender the input frames such that the dynamic pixels are removed and iteratively pass them back into the visual odometry to refine the pose estimate.

* 7 pages, 10 figures. Our code is available at https://github.com/Geniussh/Dynamic-Dense-RGBD-SLAM-with-TartanVO

Via

Access Paper or Ask Questions