Picture for Ruohan Gao

Ruohan Gao

University of Maryland, College Park

Do Audio-Visual Large Language Models Really See and Hear?

Add code
Apr 03, 2026
Viaarxiv icon

SonoWorld: From One Image to a 3D Audio-Visual Scene

Add code
Mar 30, 2026
Viaarxiv icon

AFFORD2ACT: Affordance-Guided Automatic Keypoint Selection for Generalizable and Lightweight Robotic Manipulation

Add code
Oct 01, 2025
Viaarxiv icon

Towards Perception-Informed Latent HRTF Representations

Add code
Jul 03, 2025
Viaarxiv icon

ControlTac: Force- and Position-Controlled Tactile Data Augmentation with a Single Reference Image

Add code
May 28, 2025
Figure 1 for ControlTac: Force- and Position-Controlled Tactile Data Augmentation with a Single Reference Image
Figure 2 for ControlTac: Force- and Position-Controlled Tactile Data Augmentation with a Single Reference Image
Figure 3 for ControlTac: Force- and Position-Controlled Tactile Data Augmentation with a Single Reference Image
Figure 4 for ControlTac: Force- and Position-Controlled Tactile Data Augmentation with a Single Reference Image
Viaarxiv icon

Learning to Highlight Audio by Watching Movies

Add code
May 17, 2025
Figure 1 for Learning to Highlight Audio by Watching Movies
Figure 2 for Learning to Highlight Audio by Watching Movies
Figure 3 for Learning to Highlight Audio by Watching Movies
Figure 4 for Learning to Highlight Audio by Watching Movies
Viaarxiv icon

Differentiable Room Acoustic Rendering with Multi-View Vision Priors

Add code
Apr 30, 2025
Viaarxiv icon

Hearing Anywhere in Any Environment

Add code
Apr 14, 2025
Viaarxiv icon

Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs

Add code
Mar 29, 2025
Viaarxiv icon

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs

Add code
Jan 03, 2025
Figure 1 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Figure 2 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Figure 3 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Figure 4 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Viaarxiv icon