Picture for Yuan Gong

Yuan Gong

DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners

Add code
Jul 04, 2024
Viaarxiv icon

Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer

Add code
Jun 26, 2024
Viaarxiv icon

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

Add code
Jun 14, 2024
Figure 1 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 2 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 3 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 4 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Viaarxiv icon

Generic Knowledge Boosted Pre-training For Remote Sensing Images

Add code
Jan 21, 2024
Viaarxiv icon

Joint Audio and Speech Understanding

Add code
Oct 02, 2023
Viaarxiv icon

Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning

Add code
Sep 19, 2023
Figure 1 for Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning
Figure 2 for Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning
Figure 3 for Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning
Figure 4 for Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning
Viaarxiv icon

ToonTalker: Cross-Domain Face Reenactment

Add code
Aug 24, 2023
Figure 1 for ToonTalker: Cross-Domain Face Reenactment
Figure 2 for ToonTalker: Cross-Domain Face Reenactment
Figure 3 for ToonTalker: Cross-Domain Face Reenactment
Figure 4 for ToonTalker: Cross-Domain Face Reenactment
Viaarxiv icon

Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation

Add code
Jul 13, 2023
Figure 1 for Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
Figure 2 for Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
Figure 3 for Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
Figure 4 for Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
Viaarxiv icon

Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers

Add code
Jul 06, 2023
Figure 1 for Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
Figure 2 for Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
Figure 3 for Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
Figure 4 for Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
Viaarxiv icon

TaleCrafter: Interactive Story Visualization with Multiple Characters

Add code
May 30, 2023
Figure 1 for TaleCrafter: Interactive Story Visualization with Multiple Characters
Figure 2 for TaleCrafter: Interactive Story Visualization with Multiple Characters
Figure 3 for TaleCrafter: Interactive Story Visualization with Multiple Characters
Figure 4 for TaleCrafter: Interactive Story Visualization with Multiple Characters
Viaarxiv icon