Picture for Dahua Lin

Dahua Lin

Eric

SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

Add code
Aug 06, 2025
Viaarxiv icon

Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models

Add code
Aug 01, 2025
Figure 1 for Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models
Figure 2 for Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models
Figure 3 for Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models
Figure 4 for Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models
Viaarxiv icon

Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning

Add code
Jul 22, 2025
Figure 1 for Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning
Figure 2 for Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning
Figure 3 for Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning
Figure 4 for Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning
Viaarxiv icon

CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation

Add code
Jun 24, 2025
Figure 1 for CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Figure 2 for CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Figure 3 for CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Figure 4 for CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Viaarxiv icon

ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing

Add code
Jun 24, 2025
Figure 1 for ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing
Figure 2 for ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing
Figure 3 for ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing
Figure 4 for ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing
Viaarxiv icon

InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions

Add code
Jun 11, 2025
Figure 1 for InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions
Figure 2 for InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions
Figure 3 for InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions
Figure 4 for InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions
Viaarxiv icon

GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition

Add code
Jun 09, 2025
Figure 1 for GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition
Figure 2 for GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition
Figure 3 for GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition
Figure 4 for GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition
Viaarxiv icon

Video World Models with Long-term Spatial Memory

Add code
Jun 05, 2025
Figure 1 for Video World Models with Long-term Spatial Memory
Figure 2 for Video World Models with Long-term Spatial Memory
Figure 3 for Video World Models with Long-term Spatial Memory
Figure 4 for Video World Models with Long-term Spatial Memory
Viaarxiv icon

MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence

Add code
May 29, 2025
Viaarxiv icon

AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views

Add code
May 29, 2025
Viaarxiv icon