Picture for Yu Qi

Yu Qi

MME-CoF-Pro: Evaluating Reasoning Coherence in Video Generative Models with Text and Visual Hints

Add code
Mar 20, 2026
Viaarxiv icon

Towards Compositional Generalization in LLMs for Smart Contract Security: A Case Study on Reentrancy Vulnerabilities

Add code
Jan 11, 2026
Viaarxiv icon

Residual Rotation Correction using Tactile Equivariance

Add code
Nov 11, 2025
Figure 1 for Residual Rotation Correction using Tactile Equivariance
Figure 2 for Residual Rotation Correction using Tactile Equivariance
Figure 3 for Residual Rotation Correction using Tactile Equivariance
Figure 4 for Residual Rotation Correction using Tactile Equivariance
Viaarxiv icon

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Add code
Oct 30, 2025
Viaarxiv icon

I Speak and You Find: Robust 3D Visual Grounding with Noisy and Ambiguous Speech Inputs

Add code
Jun 17, 2025
Viaarxiv icon

EquAct: An SE(3)-Equivariant Multi-Task Transformer for Open-Loop Robotic Manipulation

Add code
May 27, 2025
Figure 1 for EquAct: An SE(3)-Equivariant Multi-Task Transformer for Open-Loop Robotic Manipulation
Figure 2 for EquAct: An SE(3)-Equivariant Multi-Task Transformer for Open-Loop Robotic Manipulation
Figure 3 for EquAct: An SE(3)-Equivariant Multi-Task Transformer for Open-Loop Robotic Manipulation
Figure 4 for EquAct: An SE(3)-Equivariant Multi-Task Transformer for Open-Loop Robotic Manipulation
Viaarxiv icon

Human-like Cognitive Generalization for Large Models via Brain-in-the-loop Supervision

Add code
May 14, 2025
Viaarxiv icon

Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation

Add code
Apr 09, 2025
Figure 1 for Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation
Figure 2 for Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation
Figure 3 for Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation
Figure 4 for Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation
Viaarxiv icon

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Add code
Feb 13, 2025
Figure 1 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Figure 2 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Figure 3 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Figure 4 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Viaarxiv icon

ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter

Add code
Jul 16, 2024
Figure 1 for ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter
Figure 2 for ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter
Figure 3 for ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter
Figure 4 for ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter
Viaarxiv icon