Picture for Hang Xu

Hang Xu

ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory

Add code
Mar 16, 2025
Viaarxiv icon

Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

Add code
Mar 12, 2025
Figure 1 for Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
Figure 2 for Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
Figure 3 for Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
Figure 4 for Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
Viaarxiv icon

SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation

Add code
Mar 09, 2025
Figure 1 for SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Figure 2 for SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Figure 3 for SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Figure 4 for SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Viaarxiv icon

Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?

Add code
Mar 08, 2025
Viaarxiv icon

Towards Heisenberg limit without critical slowing down via quantum reinforcement learning

Add code
Mar 04, 2025
Viaarxiv icon

UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting

Add code
Feb 25, 2025
Figure 1 for UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
Figure 2 for UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
Figure 3 for UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
Figure 4 for UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
Viaarxiv icon

TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba

Add code
Feb 21, 2025
Figure 1 for TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Figure 2 for TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Figure 3 for TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Figure 4 for TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Viaarxiv icon

VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation

Add code
Feb 12, 2025
Viaarxiv icon

FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise

Add code
Feb 05, 2025
Viaarxiv icon

A Study of In-Context-Learning-Based Text-to-SQL Errors

Add code
Jan 16, 2025
Figure 1 for A Study of In-Context-Learning-Based Text-to-SQL Errors
Figure 2 for A Study of In-Context-Learning-Based Text-to-SQL Errors
Figure 3 for A Study of In-Context-Learning-Based Text-to-SQL Errors
Figure 4 for A Study of In-Context-Learning-Based Text-to-SQL Errors
Viaarxiv icon