Picture for Qinhan Lv

Qinhan Lv

Stable Language Guidance for Vision-Language-Action Models

Add code
Jan 07, 2026
Viaarxiv icon

CoAgent: Collaborative Planning and Consistency Agent for Coherent Video Generation

Add code
Dec 27, 2025
Viaarxiv icon

MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models

Add code
Dec 09, 2025
Figure 1 for MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
Figure 2 for MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
Figure 3 for MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
Figure 4 for MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
Viaarxiv icon

HybridToken-VLM: Hybrid Token Compression for Vision-Language Models

Add code
Dec 09, 2025
Viaarxiv icon