Picture for Wenhui Tan

Wenhui Tan

Xiaomi MiMo-VL-Miloco Technical Report

Add code
Dec 22, 2025
Viaarxiv icon

JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation

Add code
Dec 14, 2025
Viaarxiv icon

REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

Add code
Nov 17, 2025
Viaarxiv icon

Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains

Add code
May 22, 2025
Figure 1 for Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Figure 2 for Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Figure 3 for Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Figure 4 for Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Viaarxiv icon

Multi-task Manipulation Policy Modeling with Visuomotor Latent Diffusion

Add code
Mar 12, 2024
Figure 1 for Multi-task Manipulation Policy Modeling with Visuomotor Latent Diffusion
Figure 2 for Multi-task Manipulation Policy Modeling with Visuomotor Latent Diffusion
Figure 3 for Multi-task Manipulation Policy Modeling with Visuomotor Latent Diffusion
Figure 4 for Multi-task Manipulation Policy Modeling with Visuomotor Latent Diffusion
Viaarxiv icon

Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots

Add code
Jun 25, 2023
Viaarxiv icon

AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation

Add code
May 30, 2023
Figure 1 for AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation
Figure 2 for AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation
Figure 3 for AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation
Figure 4 for AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation
Viaarxiv icon