Picture for Yong Man Ro

Yong Man Ro

Robust Grounding with MLLMs against Occlusion and Small Objects via Language-guided Semantic Cues

Add code
Apr 27, 2026
Viaarxiv icon

STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding

Add code
Mar 29, 2026
Viaarxiv icon

Recursive Think-Answer Process for LLMs and VLMs

Add code
Mar 03, 2026
Viaarxiv icon

MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models

Add code
Jan 29, 2026
Viaarxiv icon

Robust Egocentric Visual Attention Prediction Through Language-guided Scene Context-aware Learning

Add code
Jan 05, 2026
Viaarxiv icon

GCAgent: Long-Video Understanding via Schematic and Narrative Episodic Memory

Add code
Nov 15, 2025
Viaarxiv icon

Unified Reinforcement and Imitation Learning for Vision-Language Models

Add code
Oct 22, 2025
Figure 1 for Unified Reinforcement and Imitation Learning for Vision-Language Models
Figure 2 for Unified Reinforcement and Imitation Learning for Vision-Language Models
Figure 3 for Unified Reinforcement and Imitation Learning for Vision-Language Models
Figure 4 for Unified Reinforcement and Imitation Learning for Vision-Language Models
Viaarxiv icon

GenRecal: Generation after Recalibration from Large to Small Vision-Language Models

Add code
Jun 18, 2025
Viaarxiv icon

Language-guided Learning for Object Detection Tackling Multiple Variations in Aerial Images

Add code
May 29, 2025
Viaarxiv icon

DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes

Add code
May 29, 2025
Viaarxiv icon