Picture for Songjun Cao

Songjun Cao

STEB: A Speech-to-Speech Translation Expressiveness Benchmark for Evaluating Beyond Translation Fidelity

Add code
Jun 24, 2026
Viaarxiv icon

MathVis-Fine: Aligning Visual Supervision with Necessity via Progressive Dependency-Guided Training for Multimodal Mathematical Reasoning

Add code
Jun 16, 2026
Viaarxiv icon

Controllable Spoken Dialogue Generation: An LLM-Driven Grading System for K-12 Non-Native English Learners

Add code
Apr 24, 2026
Viaarxiv icon

Thinking with Constructions: A Benchmark and Policy Optimization for Visual-Text Interleaved Geometric Reasoning

Add code
Mar 19, 2026
Viaarxiv icon

MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt

Add code
May 24, 2025
Viaarxiv icon

Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception

Add code
Apr 09, 2025
Figure 1 for Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception
Figure 2 for Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception
Figure 3 for Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception
Figure 4 for Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception
Viaarxiv icon

DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models

Add code
Feb 27, 2025
Figure 1 for DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models
Figure 2 for DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models
Figure 3 for DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models
Figure 4 for DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models
Viaarxiv icon

Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition

Add code
Jan 11, 2025
Figure 1 for Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition
Figure 2 for Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition
Figure 3 for Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition
Figure 4 for Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition
Viaarxiv icon

A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition

Add code
Aug 18, 2024
Viaarxiv icon

DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model

Add code
Mar 16, 2023
Figure 1 for DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model
Figure 2 for DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model
Figure 3 for DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model
Figure 4 for DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model
Viaarxiv icon