Picture for Shifeng Zhang

Shifeng Zhang

AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation

Add code
Aug 21, 2025
Viaarxiv icon

FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities

Add code
May 26, 2025
Viaarxiv icon

Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis

Add code
Apr 20, 2025
Figure 1 for Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis
Figure 2 for Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis
Figure 3 for Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis
Figure 4 for Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis
Viaarxiv icon

Structure-Aware Correspondence Learning for Relative Pose Estimation

Add code
Mar 24, 2025
Viaarxiv icon

Learning Shape-Independent Transformation via Spherical Representations for Category-Level Object Pose Estimation

Add code
Mar 19, 2025
Viaarxiv icon

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

Add code
Nov 22, 2024
Figure 1 for VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Figure 2 for VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Figure 3 for VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Figure 4 for VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Viaarxiv icon

Generating Compositional Scenes via Text-to-image RGBA Instance Generation

Add code
Nov 16, 2024
Figure 1 for Generating Compositional Scenes via Text-to-image RGBA Instance Generation
Figure 2 for Generating Compositional Scenes via Text-to-image RGBA Instance Generation
Figure 3 for Generating Compositional Scenes via Text-to-image RGBA Instance Generation
Figure 4 for Generating Compositional Scenes via Text-to-image RGBA Instance Generation
Viaarxiv icon

Feature Augmentation for Self-supervised Contrastive Learning: A Closer Look

Add code
Oct 16, 2024
Figure 1 for Feature Augmentation for Self-supervised Contrastive Learning: A Closer Look
Figure 2 for Feature Augmentation for Self-supervised Contrastive Learning: A Closer Look
Figure 3 for Feature Augmentation for Self-supervised Contrastive Learning: A Closer Look
Figure 4 for Feature Augmentation for Self-supervised Contrastive Learning: A Closer Look
Viaarxiv icon

Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model

Add code
Jun 25, 2024
Figure 1 for Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model
Figure 2 for Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model
Figure 3 for Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model
Figure 4 for Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model
Viaarxiv icon

MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation

Add code
Apr 03, 2024
Figure 1 for MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
Figure 2 for MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
Figure 3 for MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
Figure 4 for MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
Viaarxiv icon