Picture for Zuxuan Wu

Zuxuan Wu

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

Add code
Jun 17, 2024
Viaarxiv icon

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation

Add code
Jun 13, 2024
Figure 1 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Figure 2 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Figure 3 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Figure 4 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Viaarxiv icon

Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms

Add code
Jun 13, 2024
Figure 1 for Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
Figure 2 for Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
Figure 3 for Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
Figure 4 for Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
Viaarxiv icon

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Add code
Jun 11, 2024
Figure 1 for Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
Figure 2 for Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
Figure 3 for Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
Figure 4 for Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
Viaarxiv icon

AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding

Add code
Jun 11, 2024
Figure 1 for AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding
Figure 2 for AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding
Figure 3 for AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding
Figure 4 for AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding
Viaarxiv icon

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction

Add code
Jun 10, 2024
Viaarxiv icon

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

Add code
Jun 06, 2024
Figure 1 for AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Figure 2 for AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Figure 3 for AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Figure 4 for AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Viaarxiv icon

DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs

Add code
Jun 06, 2024
Viaarxiv icon

MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion

Add code
May 30, 2024
Viaarxiv icon

ModelLock: Locking Your Model With a Spell

Add code
May 25, 2024
Viaarxiv icon