Picture for Zhen Ye

Zhen Ye

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

Add code
Oct 10, 2025
Viaarxiv icon

UnifiedVisual: A Framework for Constructing Unified Vision-Language Datasets

Add code
Sep 18, 2025
Viaarxiv icon

MSR-Codec: A Low-Bitrate Multi-Stream Residual Codec for High-Fidelity Speech Generation with Information Disentanglement

Add code
Sep 16, 2025
Viaarxiv icon

Llasa+: Free Lunch for Accelerated and Streaming Llama-Based Speech Synthesis

Add code
Aug 08, 2025
Viaarxiv icon

AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness

Add code
Jul 02, 2025
Viaarxiv icon

Towards Omnidirectional Reasoning with 360-R1: A Dataset, Benchmark, and GRPO-based Method

Add code
May 20, 2025
Viaarxiv icon

J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge

Add code
May 17, 2025
Figure 1 for J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge
Figure 2 for J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge
Figure 3 for J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge
Figure 4 for J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge
Viaarxiv icon

YuE: Scaling Open Foundation Models for Long-Form Music Generation

Add code
Mar 11, 2025
Viaarxiv icon

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

Add code
Mar 03, 2025
Viaarxiv icon

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Add code
Feb 06, 2025
Viaarxiv icon