Picture for Ziang Zhang

Ziang Zhang

SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation

Add code
Mar 23, 2026
Viaarxiv icon

Generative Reasoning Recommendation via LLMs

Add code
Oct 23, 2025
Viaarxiv icon

DSI-Bench: A Benchmark for Dynamic Spatial Intelligence

Add code
Oct 21, 2025
Viaarxiv icon

GenSpace: Benchmarking Spatially-Aware Image Generation

Add code
May 30, 2025
Figure 1 for GenSpace: Benchmarking Spatially-Aware Image Generation
Figure 2 for GenSpace: Benchmarking Spatially-Aware Image Generation
Figure 3 for GenSpace: Benchmarking Spatially-Aware Image Generation
Figure 4 for GenSpace: Benchmarking Spatially-Aware Image Generation
Viaarxiv icon

Depth Anything with Any Prior

Add code
May 15, 2025
Viaarxiv icon

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

Add code
Dec 24, 2024
Viaarxiv icon

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup

Add code
Oct 28, 2024
Figure 1 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 2 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 3 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 4 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Viaarxiv icon

MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization

Add code
Oct 16, 2024
Figure 1 for MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Figure 2 for MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Figure 3 for MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Figure 4 for MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Viaarxiv icon

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Add code
Aug 29, 2024
Figure 1 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 2 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 3 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 4 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Viaarxiv icon

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

Add code
Jul 16, 2024
Figure 1 for OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
Figure 2 for OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
Figure 3 for OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
Figure 4 for OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
Viaarxiv icon