Picture for Yuan Gong

Yuan Gong

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Add code
Sep 17, 2024
Figure 1 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 2 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 3 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 4 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Viaarxiv icon

DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners

Add code
Jul 04, 2024
Figure 1 for DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners
Figure 2 for DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners
Figure 3 for DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners
Figure 4 for DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners
Viaarxiv icon

Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer

Add code
Jun 26, 2024
Figure 1 for Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer
Figure 2 for Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer
Figure 3 for Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer
Figure 4 for Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer
Viaarxiv icon

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

Add code
Jun 14, 2024
Figure 1 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 2 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 3 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 4 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Viaarxiv icon

Generic Knowledge Boosted Pre-training For Remote Sensing Images

Add code
Jan 21, 2024
Figure 1 for Generic Knowledge Boosted Pre-training For Remote Sensing Images
Figure 2 for Generic Knowledge Boosted Pre-training For Remote Sensing Images
Figure 3 for Generic Knowledge Boosted Pre-training For Remote Sensing Images
Figure 4 for Generic Knowledge Boosted Pre-training For Remote Sensing Images
Viaarxiv icon

Joint Audio and Speech Understanding

Add code
Oct 02, 2023
Figure 1 for Joint Audio and Speech Understanding
Figure 2 for Joint Audio and Speech Understanding
Figure 3 for Joint Audio and Speech Understanding
Figure 4 for Joint Audio and Speech Understanding
Viaarxiv icon

Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning

Add code
Sep 19, 2023
Viaarxiv icon

ToonTalker: Cross-Domain Face Reenactment

Add code
Aug 24, 2023
Viaarxiv icon

Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation

Add code
Jul 13, 2023
Figure 1 for Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
Figure 2 for Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
Figure 3 for Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
Figure 4 for Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
Viaarxiv icon

Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers

Add code
Jul 06, 2023
Figure 1 for Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
Figure 2 for Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
Figure 3 for Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
Figure 4 for Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
Viaarxiv icon