Picture for Hao Li

Hao Li

Jack

Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision

Add code
Aug 07, 2025
Figure 1 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Figure 2 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Figure 3 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Figure 4 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Viaarxiv icon

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law

Add code
Jul 24, 2025
Figure 1 for SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law
Figure 2 for SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law
Figure 3 for SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law
Figure 4 for SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law
Viaarxiv icon

InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation

Add code
Jul 23, 2025
Viaarxiv icon

AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation

Add code
Jul 16, 2025
Figure 1 for AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation
Figure 2 for AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation
Figure 3 for AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation
Figure 4 for AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation
Viaarxiv icon

MVAR: MultiVariate AutoRegressive Air Pollutants Forecasting Model

Add code
Jul 16, 2025
Viaarxiv icon

Omni-Video: Democratizing Unified Video Understanding and Generation

Add code
Jul 09, 2025
Viaarxiv icon

LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion

Add code
Jul 03, 2025
Figure 1 for LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
Figure 2 for LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
Figure 3 for LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
Figure 4 for LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
Viaarxiv icon

TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types

Add code
Jul 02, 2025
Figure 1 for TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types
Figure 2 for TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types
Figure 3 for TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types
Figure 4 for TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types
Viaarxiv icon

LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation

Add code
Jul 02, 2025
Viaarxiv icon

CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation

Add code
Jun 24, 2025
Figure 1 for CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Figure 2 for CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Figure 3 for CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Figure 4 for CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Viaarxiv icon