Picture for Qi Sun

Qi Sun

Animus3D: Text-driven 3D Animation via Motion Score Distillation

Add code
Dec 14, 2025
Figure 1 for Animus3D: Text-driven 3D Animation via Motion Score Distillation
Figure 2 for Animus3D: Text-driven 3D Animation via Motion Score Distillation
Figure 3 for Animus3D: Text-driven 3D Animation via Motion Score Distillation
Figure 4 for Animus3D: Text-driven 3D Animation via Motion Score Distillation
Viaarxiv icon

Unitho: A Unified Multi-Task Framework for Computational Lithography

Add code
Nov 14, 2025
Viaarxiv icon

OregairuChar: A Benchmark Dataset for Character Appearance Frequency Analysis in My Teen Romantic Comedy SNAFU

Add code
Nov 07, 2025
Viaarxiv icon

GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts

Add code
Sep 10, 2025
Viaarxiv icon

Cost-Aware Routing for Efficient Text-To-Image Generation

Add code
Jun 17, 2025
Figure 1 for Cost-Aware Routing for Efficient Text-To-Image Generation
Figure 2 for Cost-Aware Routing for Efficient Text-To-Image Generation
Figure 3 for Cost-Aware Routing for Efficient Text-To-Image Generation
Figure 4 for Cost-Aware Routing for Efficient Text-To-Image Generation
Viaarxiv icon

From Grounding to Manipulation: Case Studies of Foundation Model Integration in Embodied Robotic Systems

Add code
May 21, 2025
Viaarxiv icon

Advancing Sequential Numerical Prediction in Autoregressive Models

Add code
May 19, 2025
Viaarxiv icon

NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks

Add code
Apr 28, 2025
Figure 1 for NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
Figure 2 for NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
Figure 3 for NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
Figure 4 for NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
Viaarxiv icon

Nano-3D: Metasurface-Based Neural Depth Imaging

Add code
Mar 20, 2025
Figure 1 for Nano-3D: Metasurface-Based Neural Depth Imaging
Figure 2 for Nano-3D: Metasurface-Based Neural Depth Imaging
Figure 3 for Nano-3D: Metasurface-Based Neural Depth Imaging
Figure 4 for Nano-3D: Metasurface-Based Neural Depth Imaging
Viaarxiv icon

Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark

Add code
Jan 16, 2025
Figure 1 for Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark
Figure 2 for Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark
Figure 3 for Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark
Figure 4 for Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark
Viaarxiv icon