Picture for Manyuan Zhang

Manyuan Zhang

AdaTooler-V: Adaptive Tool-Use for Images and Videos

Add code
Dec 19, 2025
Figure 1 for AdaTooler-V: Adaptive Tool-Use for Images and Videos
Figure 2 for AdaTooler-V: Adaptive Tool-Use for Images and Videos
Figure 3 for AdaTooler-V: Adaptive Tool-Use for Images and Videos
Figure 4 for AdaTooler-V: Adaptive Tool-Use for Images and Videos
Viaarxiv icon

OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation

Add code
Dec 10, 2025
Viaarxiv icon

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Add code
Oct 30, 2025
Viaarxiv icon

IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction

Add code
Oct 26, 2025
Viaarxiv icon

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

Add code
Mar 27, 2025
Figure 1 for Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
Figure 2 for Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
Figure 3 for Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
Figure 4 for Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
Viaarxiv icon

Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models

Add code
May 01, 2024
Figure 1 for Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Figure 2 for Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Figure 3 for Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Figure 4 for Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Viaarxiv icon

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

Add code
Jan 31, 2024
Figure 1 for Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Figure 2 for Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Figure 3 for Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Figure 4 for Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Viaarxiv icon

Towards Large-scale Masked Face Recognition

Add code
Oct 25, 2023
Figure 1 for Towards Large-scale Masked Face Recognition
Figure 2 for Towards Large-scale Masked Face Recognition
Figure 3 for Towards Large-scale Masked Face Recognition
Figure 4 for Towards Large-scale Masked Face Recognition
Viaarxiv icon

Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection

Add code
Oct 24, 2023
Viaarxiv icon

VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation

Add code
Mar 17, 2023
Figure 1 for VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation
Figure 2 for VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation
Figure 3 for VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation
Figure 4 for VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation
Viaarxiv icon