Picture for Yu Qi

Yu Qi

Residual Rotation Correction using Tactile Equivariance

Add code
Nov 11, 2025
Viaarxiv icon

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Add code
Oct 30, 2025
Viaarxiv icon

I Speak and You Find: Robust 3D Visual Grounding with Noisy and Ambiguous Speech Inputs

Add code
Jun 17, 2025
Viaarxiv icon

EquAct: An SE(3)-Equivariant Multi-Task Transformer for Open-Loop Robotic Manipulation

Add code
May 27, 2025
Figure 1 for EquAct: An SE(3)-Equivariant Multi-Task Transformer for Open-Loop Robotic Manipulation
Figure 2 for EquAct: An SE(3)-Equivariant Multi-Task Transformer for Open-Loop Robotic Manipulation
Figure 3 for EquAct: An SE(3)-Equivariant Multi-Task Transformer for Open-Loop Robotic Manipulation
Figure 4 for EquAct: An SE(3)-Equivariant Multi-Task Transformer for Open-Loop Robotic Manipulation
Viaarxiv icon

Human-like Cognitive Generalization for Large Models via Brain-in-the-loop Supervision

Add code
May 14, 2025
Viaarxiv icon

Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation

Add code
Apr 09, 2025
Viaarxiv icon

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Add code
Feb 13, 2025
Figure 1 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Figure 2 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Figure 3 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Figure 4 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Viaarxiv icon

ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter

Add code
Jul 16, 2024
Figure 1 for ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter
Figure 2 for ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter
Figure 3 for ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter
Figure 4 for ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter
Viaarxiv icon

MindGPT: Interpreting What You See with Non-invasive Brain Recordings

Add code
Sep 27, 2023
Viaarxiv icon

STGIN: Spatial-Temporal Graph Interaction Network for Large-scale POI Recommendation

Add code
Sep 05, 2023
Figure 1 for STGIN: Spatial-Temporal Graph Interaction Network for Large-scale POI Recommendation
Figure 2 for STGIN: Spatial-Temporal Graph Interaction Network for Large-scale POI Recommendation
Figure 3 for STGIN: Spatial-Temporal Graph Interaction Network for Large-scale POI Recommendation
Figure 4 for STGIN: Spatial-Temporal Graph Interaction Network for Large-scale POI Recommendation
Viaarxiv icon