Picture for Qi Sun

Qi Sun

From Grounding to Manipulation: Case Studies of Foundation Model Integration in Embodied Robotic Systems

Add code
May 21, 2025
Viaarxiv icon

Advancing Sequential Numerical Prediction in Autoregressive Models

Add code
May 19, 2025
Viaarxiv icon

NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks

Add code
Apr 28, 2025
Viaarxiv icon

Nano-3D: Metasurface-Based Neural Depth Imaging

Add code
Mar 20, 2025
Viaarxiv icon

Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark

Add code
Jan 16, 2025
Figure 1 for Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark
Figure 2 for Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark
Figure 3 for Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark
Figure 4 for Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark
Viaarxiv icon

$\text{Transformer}^2$: Self-adaptive LLMs

Add code
Jan 14, 2025
Viaarxiv icon

Computer Vision-Driven Gesture Recognition: Toward Natural and Intuitive Human-Computer

Add code
Dec 24, 2024
Viaarxiv icon

Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning

Add code
Dec 17, 2024
Viaarxiv icon

FovealNet: Advancing AI-Driven Gaze Tracking Solutions for Optimized Foveated Rendering System Performance in Virtual Reality

Add code
Dec 12, 2024
Figure 1 for FovealNet: Advancing AI-Driven Gaze Tracking Solutions for Optimized Foveated Rendering System Performance in Virtual Reality
Figure 2 for FovealNet: Advancing AI-Driven Gaze Tracking Solutions for Optimized Foveated Rendering System Performance in Virtual Reality
Figure 3 for FovealNet: Advancing AI-Driven Gaze Tracking Solutions for Optimized Foveated Rendering System Performance in Virtual Reality
Figure 4 for FovealNet: Advancing AI-Driven Gaze Tracking Solutions for Optimized Foveated Rendering System Performance in Virtual Reality
Viaarxiv icon

BudgetFusion: Perceptually-Guided Adaptive Diffusion Models

Add code
Dec 10, 2024
Viaarxiv icon