Picture for Haoyang Zhang

Haoyang Zhang

Tony

AlignDrive: Aligned Lateral-Longitudinal Planning for End-to-End Autonomous Driving

Add code
Jan 05, 2026
Viaarxiv icon

DepFlow: Disentangled Speech Generation to Mitigate Semantic Bias in Depression Detection

Add code
Jan 01, 2026
Viaarxiv icon

OMG-Bench: A New Challenging Benchmark for Skeleton-based Online Micro Hand Gesture Recognition

Add code
Dec 18, 2025
Viaarxiv icon

Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models

Add code
Oct 10, 2025
Viaarxiv icon

Step-Audio 2 Technical Report

Add code
Jul 24, 2025
Figure 1 for Step-Audio 2 Technical Report
Figure 2 for Step-Audio 2 Technical Report
Figure 3 for Step-Audio 2 Technical Report
Figure 4 for Step-Audio 2 Technical Report
Viaarxiv icon

NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025

Add code
Jun 16, 2025
Viaarxiv icon

Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage

Add code
Jun 06, 2025
Figure 1 for Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage
Figure 2 for Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage
Figure 3 for Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage
Figure 4 for Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage
Viaarxiv icon

Generating Multimodal Driving Scenes via Next-Scene Prediction

Add code
Mar 19, 2025
Viaarxiv icon

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

Add code
Feb 18, 2025
Viaarxiv icon

G10: Enabling An Efficient Unified GPU Memory and Storage Architecture with Smart Tensor Migrations

Add code
Oct 13, 2023
Viaarxiv icon