Picture for Yumeng Li

Yumeng Li

Dive into the Scene: Breaking the Perceptual Bottleneck in Vision-Language Decision Making via Focus Plan Generation

Add code
Jun 02, 2026
Viaarxiv icon

EchoAvatar: Real-time Generative Avatar Animation from Audio Streams

Add code
May 27, 2026
Viaarxiv icon

MosaicMem: Hybrid Spatial Memory for Controllable Video World Models

Add code
Mar 17, 2026
Viaarxiv icon

Multimodal OCR: Parse Anything from Documents

Add code
Mar 13, 2026
Viaarxiv icon

PLACID: Identity-Preserving Multi-Object Compositing via Video Diffusion with Synthetic Trajectories

Add code
Jan 30, 2026
Viaarxiv icon

Motion-example-controlled Co-speech Gesture Generation Leveraging Large Language Models

Add code
Jul 27, 2025
Viaarxiv icon

RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars

Add code
Mar 17, 2025
Viaarxiv icon

Dual Mutual Learning Network with Global-local Awareness for RGB-D Salient Object Detection

Add code
Jan 03, 2025
Figure 1 for Dual Mutual Learning Network with Global-local Awareness for RGB-D Salient Object Detection
Figure 2 for Dual Mutual Learning Network with Global-local Awareness for RGB-D Salient Object Detection
Figure 3 for Dual Mutual Learning Network with Global-local Awareness for RGB-D Salient Object Detection
Figure 4 for Dual Mutual Learning Network with Global-local Awareness for RGB-D Salient Object Detection
Viaarxiv icon

Slow Perception: Let's Perceive Geometric Figures Step-by-step

Add code
Dec 30, 2024
Figure 1 for Slow Perception: Let's Perceive Geometric Figures Step-by-step
Figure 2 for Slow Perception: Let's Perceive Geometric Figures Step-by-step
Figure 3 for Slow Perception: Let's Perceive Geometric Figures Step-by-step
Figure 4 for Slow Perception: Let's Perceive Geometric Figures Step-by-step
Viaarxiv icon

Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models

Add code
Nov 27, 2024
Figure 1 for Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models
Figure 2 for Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models
Figure 3 for Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models
Figure 4 for Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models
Viaarxiv icon