Picture for Shuai Yang

Shuai Yang

WORLDMEM: Long-term Consistent World Simulation with Memory

Add code
Apr 16, 2025
Figure 1 for WORLDMEM: Long-term Consistent World Simulation with Memory
Figure 2 for WORLDMEM: Long-term Consistent World Simulation with Memory
Figure 3 for WORLDMEM: Long-term Consistent World Simulation with Memory
Figure 4 for WORLDMEM: Long-term Consistent World Simulation with Memory
Viaarxiv icon

Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation

Add code
Apr 13, 2025
Figure 1 for Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Figure 2 for Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Figure 3 for Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Figure 4 for Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Viaarxiv icon

HeteRAG: A Heterogeneous Retrieval-augmented Generation Framework with Decoupled Knowledge Representations

Add code
Apr 12, 2025
Figure 1 for HeteRAG: A Heterogeneous Retrieval-augmented Generation Framework with Decoupled Knowledge Representations
Figure 2 for HeteRAG: A Heterogeneous Retrieval-augmented Generation Framework with Decoupled Knowledge Representations
Figure 3 for HeteRAG: A Heterogeneous Retrieval-augmented Generation Framework with Decoupled Knowledge Representations
Figure 4 for HeteRAG: A Heterogeneous Retrieval-augmented Generation Framework with Decoupled Knowledge Representations
Viaarxiv icon

OmniCam: Unified Multimodal Video Generation via Camera Control

Add code
Apr 03, 2025
Figure 1 for OmniCam: Unified Multimodal Video Generation via Camera Control
Figure 2 for OmniCam: Unified Multimodal Video Generation via Camera Control
Figure 3 for OmniCam: Unified Multimodal Video Generation via Camera Control
Figure 4 for OmniCam: Unified Multimodal Video Generation via Camera Control
Viaarxiv icon

A Survey on Remote Sensing Foundation Models: From Vision to Multimodality

Add code
Mar 28, 2025
Figure 1 for A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Figure 2 for A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Figure 3 for A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Figure 4 for A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Viaarxiv icon

Language-based Image Colorization: A Benchmark and Beyond

Add code
Mar 19, 2025
Viaarxiv icon

Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space

Add code
Mar 12, 2025
Viaarxiv icon

MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention

Add code
Mar 11, 2025
Figure 1 for MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention
Figure 2 for MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention
Figure 3 for MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention
Figure 4 for MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention
Viaarxiv icon

Balanced Image Stylization with Style Matching Score

Add code
Mar 10, 2025
Figure 1 for Balanced Image Stylization with Style Matching Score
Figure 2 for Balanced Image Stylization with Style Matching Score
Figure 3 for Balanced Image Stylization with Style Matching Score
Figure 4 for Balanced Image Stylization with Style Matching Score
Viaarxiv icon

OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images

Add code
Mar 08, 2025
Figure 1 for OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
Figure 2 for OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
Figure 3 for OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
Figure 4 for OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
Viaarxiv icon