Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yanming Yang

Laboratory of Ocean acoustics and Remote Sensing, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, Fujian, China, Fujian Provincial Key Laboratory of Marine Physical and Geological Processes, Xiamen, Fujian, China

WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance

Sep 18, 2025

Chenxi Song, Yanming Yang, Tong Zhao, Ruibo Li, Chi Zhang

Figure 1 for WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance

Figure 2 for WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance

Figure 3 for WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance

Figure 4 for WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance

Abstract:Recent video diffusion models demonstrate strong potential in spatial intelligence tasks due to their rich latent world priors. However, this potential is hindered by their limited controllability and geometric inconsistency, creating a gap between their strong priors and their practical use in 3D/4D tasks. As a result, current approaches often rely on retraining or fine-tuning, which risks degrading pretrained knowledge and incurs high computational costs. To address this, we propose WorldForge, a training-free, inference-time framework composed of three tightly coupled modules. Intra-Step Recursive Refinement introduces a recursive refinement mechanism during inference, which repeatedly optimizes network predictions within each denoising step to enable precise trajectory injection. Flow-Gated Latent Fusion leverages optical flow similarity to decouple motion from appearance in the latent space and selectively inject trajectory guidance into motion-related channels. Dual-Path Self-Corrective Guidance compares guided and unguided denoising paths to adaptively correct trajectory drift caused by noisy or misaligned structural signals. Together, these components inject fine-grained, trajectory-aligned guidance without training, achieving both accurate motion control and photorealistic content generation. Extensive experiments across diverse benchmarks validate our method's superiority in realism, trajectory consistency, and visual fidelity. This work introduces a novel plug-and-play paradigm for controllable video synthesis, offering a new perspective on leveraging generative priors for spatial intelligence.

* Project Webpage: https://worldforge-agi.github.io/

Via

Access Paper or Ask Questions

Inversion of Arctic dual-channel sound speed profile based on random airgun signal

Aug 13, 2025

Jinbao Weng, Yubo Qi, Yanming Yang, Hongtao Wen, Hongtao Zhou, Benqing Chen, Dewei Xu, Ruichao Xue, Caigao Zeng

Abstract:For the unique dual-channel sound speed profiles of the Canadian Basin and the Chukchi Plateau in the Arctic, based on the propagation characteristics of refracted normal modes under dual-channel sound speed profiles, an inversion method using refracted normal modes for dual-channel sound speed profiles is proposed. This method proposes a dual-parameter representation method for dual-channel sound speed profiles, tailored to the characteristics of dual-channel sound speed profiles. A dispersion structure extraction method is proposed for the dispersion structure characteristics of refracted normal modes under dual-channel sound speed profiles. Combining the parameter representation method of sound speed profiles and the dispersion structure extraction method, an inversion method for dual-channel sound speed profiles is proposed. For the common horizontal variation of sound speed profiles in long-distance acoustic propagation, a method for inverting horizontally varying dual-channel sound speed profiles is proposed. Finally, this article verifies the effectiveness of the dual-channel sound speed profile inversion method using the Arctic low-frequency long-range acoustic propagation experiment. Compared with previous sound speed profile inversion methods, the method proposed in this article has the advantages of fewer inversion parameters and faster inversion speed. It can be implemented using only a single hydrophone passively receiving random air gun signals, and it also solves the inversion problem of horizontal variation of sound speed profiles. It has significant advantages such as low cost, easy deployment, and fast computation speed.

Via

Access Paper or Ask Questions

Acoustic source depth estimation method based on a single hydrophone in Arctic underwater

Aug 13, 2025

Jinbao Weng, Yubo Qi, Yanming Yang, Hongtao Wen, Hongtao Zhou, Benqing Chen, Dewei Xu, Ruichao Xue, Caigao Zeng

Abstract:Based on the normal mode and ray theory, this article discusses the characteristics of surface sound source and reception at the surface layer, and explores depth estimation methods based on normal modes and rays, and proposes a depth estimation method based on the upper limit of modal frequency. Data verification is conducted to discuss the applicability and limitations of different methods. For the surface refracted normal mode waveguide, modes can be separated through warping transformation. Based on the characteristics of normal mode amplitude variation with frequency and number, the sound source depth can be estimated by matching amplitude information. Based on the spatial variation characteristics of eigenfunctions with frequency, a sound source depth estimation method matching the cutoff frequency of normal modes is proposed. For the deep Arctic sea, the sound ray arrival structure at the receiving end is obtained through the analysis of deep inversion sound ray trajectories, and the sound source depth can be estimated by matching the time difference of ray arrivals. Experimental data is used to verify the sound field patterns and the effectiveness of the sound source depth estimation method.

Via

Access Paper or Ask Questions

FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing

Jun 05, 2025

Guangzhao Li, Yanming Yang, Chenxi Song, Chi Zhang

Abstract:Text-driven video editing aims to modify video content according to natural language instructions. While recent training-free approaches have made progress by leveraging pre-trained diffusion models, they typically rely on inversion-based techniques that map input videos into the latent space, which often leads to temporal inconsistencies and degraded structural fidelity. To address this, we propose FlowDirector, a novel inversion-free video editing framework. Our framework models the editing process as a direct evolution in data space, guiding the video via an Ordinary Differential Equation (ODE) to smoothly transition along its inherent spatiotemporal manifold, thereby preserving temporal coherence and structural details. To achieve localized and controllable edits, we introduce an attention-guided masking mechanism that modulates the ODE velocity field, preserving non-target regions both spatially and temporally. Furthermore, to address incomplete edits and enhance semantic alignment with editing instructions, we present a guidance-enhanced editing strategy inspired by Classifier-Free Guidance, which leverages differential signals between multiple candidate flows to steer the editing trajectory toward stronger semantic alignment without compromising structural consistency. Extensive experiments across benchmarks demonstrate that FlowDirector achieves state-of-the-art performance in instruction adherence, temporal consistency, and background preservation, establishing a new paradigm for efficient and coherent video editing without inversion.

* Project Page is https://flowdirector-edit.github.io

Via

Access Paper or Ask Questions

Research and experimental verification on low-frequency long-range underwater sound propagation dispersion characteristics under dual-channel sound speed profiles in the Chukchi Plateau

Nov 13, 2023

Jinbao Weng, Yubo Qi, Yanming Yang, Hongtao Wen, Hongtao Zhou, Ruichao Xue

Abstract:The dual-channel sound speed profiles of the Chukchi Plateau and the Canadian Basin have become current research hotspots due to their excellent low-frequency sound signal propagation ability. Previous research has mainly focused on using sound propagation theory to explain the changes in sound signal energy. This article is mainly based on the theory of normal modes to study the fine structure of low-frequency wide-band sound propagation dispersion under dual-channel sound speed profiles. In this paper, the problem of the intersection of normal mode dispersion curves caused by the dual-channel sound speed profile (SSP) has been explained, the blocking effect of seabed terrain changes on dispersion structures has been analyzed, and the normal modes has been separated by using modified warping operator. The above research results have been verified through a long-range seismic exploration experiment at the Chukchi Plateau. At the same time, based on the acoustic signal characteristics in this environment, two methods for estimating the distance of sound sources have been proposed, and the experiment data at sea has also verified these two methods.

* 30 pages, 18 figures

Via

Access Paper or Ask Questions