Picture for Xingyu Zhang

Xingyu Zhang

CAIFormer: A Causal Informed Transformer for Multivariate Time Series Forecasting

Add code
May 22, 2025
Viaarxiv icon

SLOT: Sample-specific Language Model Optimization at Test-time

Add code
May 18, 2025
Viaarxiv icon

GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

Add code
Mar 07, 2025
Viaarxiv icon

Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving

Add code
Mar 05, 2025
Viaarxiv icon

AVE Speech Dataset: A Comprehensive Benchmark for Multi-Modal Speech Recognition Integrating Audio, Visual, and Electromyographic Signals

Add code
Jan 28, 2025
Figure 1 for AVE Speech Dataset: A Comprehensive Benchmark for Multi-Modal Speech Recognition Integrating Audio, Visual, and Electromyographic Signals
Figure 2 for AVE Speech Dataset: A Comprehensive Benchmark for Multi-Modal Speech Recognition Integrating Audio, Visual, and Electromyographic Signals
Figure 3 for AVE Speech Dataset: A Comprehensive Benchmark for Multi-Modal Speech Recognition Integrating Audio, Visual, and Electromyographic Signals
Figure 4 for AVE Speech Dataset: A Comprehensive Benchmark for Multi-Modal Speech Recognition Integrating Audio, Visual, and Electromyographic Signals
Viaarxiv icon

LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition

Add code
Jan 08, 2025
Figure 1 for LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
Figure 2 for LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
Figure 3 for LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
Figure 4 for LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
Viaarxiv icon

Optimized Coordination Strategy for Multi-Aerospace Systems in Pick-and-Place Tasks By Deep Neural Network

Add code
Dec 13, 2024
Viaarxiv icon

EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Add code
Nov 15, 2024
Figure 1 for EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Figure 2 for EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Figure 3 for EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Figure 4 for EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Viaarxiv icon

Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

Add code
Oct 29, 2024
Figure 1 for Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Figure 2 for Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Figure 3 for Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Figure 4 for Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Viaarxiv icon

HE-Drive: Human-Like End-to-End Driving with Vision Language Models

Add code
Oct 07, 2024
Viaarxiv icon