Picture for Ruoyu Wang

Ruoyu Wang

Efficient and Generalizable Environmental Understanding for Visual Navigation

Add code
Jun 18, 2025
Figure 1 for Efficient and Generalizable Environmental Understanding for Visual Navigation
Figure 2 for Efficient and Generalizable Environmental Understanding for Visual Navigation
Figure 3 for Efficient and Generalizable Environmental Understanding for Visual Navigation
Figure 4 for Efficient and Generalizable Environmental Understanding for Visual Navigation
Viaarxiv icon

Exploring Speaker Diarization with Mixture of Experts

Add code
Jun 17, 2025
Viaarxiv icon

Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos

Add code
May 19, 2025
Viaarxiv icon

Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Preference Understanding

Add code
Apr 25, 2025
Viaarxiv icon

Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework

Add code
Apr 22, 2025
Viaarxiv icon

L2COcc: Lightweight Camera-Centric Semantic Scene Completion via Distillation of LiDAR Model

Add code
Mar 16, 2025
Viaarxiv icon

Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent

Add code
Feb 17, 2025
Figure 1 for Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent
Figure 2 for Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent
Figure 3 for Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent
Figure 4 for Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent
Viaarxiv icon

Latent Swap Joint Diffusion for Long-Form Audio Generation

Add code
Feb 07, 2025
Figure 1 for Latent Swap Joint Diffusion for Long-Form Audio Generation
Figure 2 for Latent Swap Joint Diffusion for Long-Form Audio Generation
Figure 3 for Latent Swap Joint Diffusion for Long-Form Audio Generation
Figure 4 for Latent Swap Joint Diffusion for Long-Form Audio Generation
Viaarxiv icon

The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation

Add code
Dec 06, 2024
Figure 1 for The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation
Figure 2 for The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation
Figure 3 for The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation
Figure 4 for The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation
Viaarxiv icon

SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation

Add code
Nov 28, 2024
Figure 1 for SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation
Figure 2 for SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation
Figure 3 for SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation
Figure 4 for SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation
Viaarxiv icon