Picture for Han Zhao

Han Zhao

FunCineForge: A Unified Dataset Toolkit and Model for Zero-Shot Movie Dubbing in Diverse Cinematic Scenes

Add code
Jan 21, 2026
Viaarxiv icon

Embodied Robot Manipulation in the Era of Foundation Models: Planning and Learning Perspectives

Add code
Dec 28, 2025
Viaarxiv icon

Harli: SLO-Aware Co-location of LLM Inference and PEFT-based Finetuning on Model-as-a-Service Platforms

Add code
Nov 19, 2025
Viaarxiv icon

When LRP Diverges from Leave-One-Out in Transformers

Add code
Oct 21, 2025
Viaarxiv icon

MELA-TTS: Joint transformer-diffusion model with representation alignment for speech synthesis

Add code
Sep 18, 2025
Viaarxiv icon

FunAudio-ASR Technical Report

Add code
Sep 15, 2025
Figure 1 for FunAudio-ASR Technical Report
Figure 2 for FunAudio-ASR Technical Report
Figure 3 for FunAudio-ASR Technical Report
Figure 4 for FunAudio-ASR Technical Report
Viaarxiv icon

Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution

Add code
Sep 11, 2025
Viaarxiv icon

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Add code
Sep 11, 2025
Viaarxiv icon

Omne-R1: Learning to Reason with Memory for Multi-hop Question Answering

Add code
Aug 24, 2025
Viaarxiv icon

ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver

Add code
Aug 14, 2025
Viaarxiv icon