Picture for Pengfei Wan

Pengfei Wan

Omni-o3: Deep Nested Omnimodal Deduction for Deliberative Audio-Visual Reasoning

Add code
Apr 27, 2026
Viaarxiv icon

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

Add code
Apr 06, 2026
Viaarxiv icon

S3KF: Spherical State-Space Kalman Filtering for Panoramic 3D Multi-Object Tracking

Add code
Mar 29, 2026
Viaarxiv icon

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Add code
Mar 26, 2026
Viaarxiv icon

Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Add code
Mar 26, 2026
Viaarxiv icon

Beyond the Golden Data: Resolving the Motion-Vision Quality Dilemma via Timestep Selective Training

Add code
Mar 26, 2026
Viaarxiv icon

Kling-MotionControl Technical Report

Add code
Mar 03, 2026
Viaarxiv icon

Analytic Score Optimization for Multi Dimension Video Quality Assessment

Add code
Feb 18, 2026
Viaarxiv icon

Embed-RL: Reinforcement Learning for Reasoning-Driven Multimodal Embeddings

Add code
Feb 14, 2026
Viaarxiv icon

TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions

Add code
Feb 09, 2026
Viaarxiv icon