Picture for Qiushi Zhu

Qiushi Zhu

Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models

Add code
May 16, 2024
Figure 1 for Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
Figure 2 for Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
Figure 3 for Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
Figure 4 for Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
Viaarxiv icon

Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation

Add code
Jan 07, 2024
Figure 1 for Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
Figure 2 for Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
Figure 3 for Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
Figure 4 for Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
Viaarxiv icon

Rep2wav: Noise Robust text-to-speech Using self-supervised representations

Add code
Sep 04, 2023
Figure 1 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Figure 2 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Figure 3 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Figure 4 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Viaarxiv icon

Noise-aware Speech Enhancement using Diffusion Probabilistic Model

Add code
Jul 16, 2023
Figure 1 for Noise-aware Speech Enhancement using Diffusion Probabilistic Model
Figure 2 for Noise-aware Speech Enhancement using Diffusion Probabilistic Model
Figure 3 for Noise-aware Speech Enhancement using Diffusion Probabilistic Model
Figure 4 for Noise-aware Speech Enhancement using Diffusion Probabilistic Model
Viaarxiv icon

Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition

Add code
Jun 18, 2023
Figure 1 for Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition
Figure 2 for Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition
Figure 3 for Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition
Figure 4 for Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition
Viaarxiv icon

Eeg2vec: Self-Supervised Electroencephalographic Representation Learning

Add code
May 23, 2023
Figure 1 for Eeg2vec: Self-Supervised Electroencephalographic Representation Learning
Figure 2 for Eeg2vec: Self-Supervised Electroencephalographic Representation Learning
Figure 3 for Eeg2vec: Self-Supervised Electroencephalographic Representation Learning
Figure 4 for Eeg2vec: Self-Supervised Electroencephalographic Representation Learning
Viaarxiv icon

Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition

Add code
May 16, 2023
Figure 1 for Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition
Figure 2 for Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition
Figure 3 for Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition
Figure 4 for Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition
Viaarxiv icon

Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR

Add code
Apr 23, 2023
Figure 1 for Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
Figure 2 for Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
Figure 3 for Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
Figure 4 for Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
Viaarxiv icon

Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition

Add code
Feb 22, 2023
Figure 1 for Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition
Figure 2 for Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition
Figure 3 for Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition
Figure 4 for Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition
Viaarxiv icon

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

Add code
Nov 21, 2022
Figure 1 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Figure 2 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Figure 3 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Figure 4 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Viaarxiv icon