Picture for Shujie Liu

Shujie Liu

Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction

Add code
Sep 25, 2023
Figure 1 for Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction
Figure 2 for Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction
Figure 3 for Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction
Figure 4 for Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction
Viaarxiv icon

WavMark: Watermarking for Audio Generation

Add code
Aug 24, 2023
Figure 1 for WavMark: Watermarking for Audio Generation
Figure 2 for WavMark: Watermarking for Audio Generation
Figure 3 for WavMark: Watermarking for Audio Generation
Figure 4 for WavMark: Watermarking for Audio Generation
Viaarxiv icon

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Add code
Aug 14, 2023
Viaarxiv icon

On decoder-only architecture for speech-to-text and large language model integration

Add code
Jul 14, 2023
Viaarxiv icon

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

Add code
Jun 28, 2023
Viaarxiv icon

Accelerating Transducers through Adjacent Token Merging

Add code
Jun 28, 2023
Figure 1 for Accelerating Transducers through Adjacent Token Merging
Figure 2 for Accelerating Transducers through Adjacent Token Merging
Figure 3 for Accelerating Transducers through Adjacent Token Merging
Figure 4 for Accelerating Transducers through Adjacent Token Merging
Viaarxiv icon

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

Add code
May 25, 2023
Figure 1 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Figure 2 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Figure 3 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Figure 4 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Viaarxiv icon

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

Add code
May 24, 2023
Figure 1 for ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Figure 2 for ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Figure 3 for ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Figure 4 for ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Viaarxiv icon

Code-Switching Text Generation and Injection in Mandarin-English ASR

Add code
Mar 20, 2023
Figure 1 for Code-Switching Text Generation and Injection in Mandarin-English ASR
Figure 2 for Code-Switching Text Generation and Injection in Mandarin-English ASR
Figure 3 for Code-Switching Text Generation and Injection in Mandarin-English ASR
Figure 4 for Code-Switching Text Generation and Injection in Mandarin-English ASR
Viaarxiv icon

Target Sound Extraction with Variable Cross-modality Clues

Add code
Mar 15, 2023
Figure 1 for Target Sound Extraction with Variable Cross-modality Clues
Figure 2 for Target Sound Extraction with Variable Cross-modality Clues
Figure 3 for Target Sound Extraction with Variable Cross-modality Clues
Figure 4 for Target Sound Extraction with Variable Cross-modality Clues
Viaarxiv icon