Picture for Zhijian Ou

Zhijian Ou

Tsinghua University

CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment

Add code
Feb 23, 2026
Viaarxiv icon

Phoneme-based speech recognition driven by large language models and sampling marginalization

Add code
Dec 20, 2025
Viaarxiv icon

Improving End-to-End Training of Retrieval-Augmented Generation Models via Joint Stochastic Approximation

Add code
Aug 25, 2025
Figure 1 for Improving End-to-End Training of Retrieval-Augmented Generation Models via Joint Stochastic Approximation
Figure 2 for Improving End-to-End Training of Retrieval-Augmented Generation Models via Joint Stochastic Approximation
Figure 3 for Improving End-to-End Training of Retrieval-Augmented Generation Models via Joint Stochastic Approximation
Figure 4 for Improving End-to-End Training of Retrieval-Augmented Generation Models via Joint Stochastic Approximation
Viaarxiv icon

Lightweight and Robust Multi-Channel End-to-End Speech Recognition with Spherical Harmonic Transform

Add code
Jun 13, 2025
Figure 1 for Lightweight and Robust Multi-Channel End-to-End Speech Recognition with Spherical Harmonic Transform
Figure 2 for Lightweight and Robust Multi-Channel End-to-End Speech Recognition with Spherical Harmonic Transform
Figure 3 for Lightweight and Robust Multi-Channel End-to-End Speech Recognition with Spherical Harmonic Transform
Figure 4 for Lightweight and Robust Multi-Channel End-to-End Speech Recognition with Spherical Harmonic Transform
Viaarxiv icon

LLM-based phoneme-to-grapheme for phoneme-based speech recognition

Add code
Jun 05, 2025
Figure 1 for LLM-based phoneme-to-grapheme for phoneme-based speech recognition
Figure 2 for LLM-based phoneme-to-grapheme for phoneme-based speech recognition
Figure 3 for LLM-based phoneme-to-grapheme for phoneme-based speech recognition
Figure 4 for LLM-based phoneme-to-grapheme for phoneme-based speech recognition
Viaarxiv icon

Joint-stochastic-approximation Random Fields with Application to Semi-supervised Learning

Add code
May 24, 2025
Figure 1 for Joint-stochastic-approximation Random Fields with Application to Semi-supervised Learning
Figure 2 for Joint-stochastic-approximation Random Fields with Application to Semi-supervised Learning
Figure 3 for Joint-stochastic-approximation Random Fields with Application to Semi-supervised Learning
Figure 4 for Joint-stochastic-approximation Random Fields with Application to Semi-supervised Learning
Viaarxiv icon

Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning

Add code
May 24, 2025
Figure 1 for Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning
Figure 2 for Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning
Figure 3 for Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning
Figure 4 for Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning
Viaarxiv icon

An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought

Add code
Jul 22, 2024
Figure 1 for An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought
Figure 2 for An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought
Figure 3 for An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought
Figure 4 for An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought
Viaarxiv icon

Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training

Add code
Jul 18, 2024
Figure 1 for Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training
Figure 2 for Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training
Figure 3 for Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training
Figure 4 for Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training
Viaarxiv icon

CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR

Add code
Jul 14, 2024
Figure 1 for CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
Figure 2 for CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
Figure 3 for CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
Viaarxiv icon