Picture for Xiangyu Zhang

Xiangyu Zhang

PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning

Add code
Jan 09, 2026
Viaarxiv icon

DepFlow: Disentangled Speech Generation to Mitigate Semantic Bias in Depression Detection

Add code
Jan 01, 2026
Viaarxiv icon

Step-DeepResearch Technical Report

Add code
Dec 24, 2025
Viaarxiv icon

Step-GUI Technical Report

Add code
Dec 19, 2025
Figure 1 for Step-GUI Technical Report
Figure 2 for Step-GUI Technical Report
Figure 3 for Step-GUI Technical Report
Figure 4 for Step-GUI Technical Report
Viaarxiv icon

SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulation

Add code
Nov 12, 2025
Viaarxiv icon

Step-Audio-EditX Technical Report

Add code
Nov 05, 2025
Figure 1 for Step-Audio-EditX Technical Report
Figure 2 for Step-Audio-EditX Technical Report
Figure 3 for Step-Audio-EditX Technical Report
Figure 4 for Step-Audio-EditX Technical Report
Viaarxiv icon

Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models

Add code
Oct 10, 2025
Viaarxiv icon

An Energy-Efficient Edge Coprocessor for Neural Rendering with Explicit Data Reuse Strategies

Add code
Oct 09, 2025
Figure 1 for An Energy-Efficient Edge Coprocessor for Neural Rendering with Explicit Data Reuse Strategies
Figure 2 for An Energy-Efficient Edge Coprocessor for Neural Rendering with Explicit Data Reuse Strategies
Figure 3 for An Energy-Efficient Edge Coprocessor for Neural Rendering with Explicit Data Reuse Strategies
Figure 4 for An Energy-Efficient Edge Coprocessor for Neural Rendering with Explicit Data Reuse Strategies
Viaarxiv icon

Beyond Video-to-SFX: Video to Audio Synthesis with Environmentally Aware Speech

Add code
Sep 19, 2025
Figure 1 for Beyond Video-to-SFX: Video to Audio Synthesis with Environmentally Aware Speech
Figure 2 for Beyond Video-to-SFX: Video to Audio Synthesis with Environmentally Aware Speech
Figure 3 for Beyond Video-to-SFX: Video to Audio Synthesis with Environmentally Aware Speech
Figure 4 for Beyond Video-to-SFX: Video to Audio Synthesis with Environmentally Aware Speech
Viaarxiv icon

MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation

Add code
Aug 26, 2025
Figure 1 for MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation
Figure 2 for MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation
Figure 3 for MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation
Figure 4 for MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation
Viaarxiv icon