Picture for Chao-Han Huck Yang

Chao-Han Huck Yang

An Investigation of Incorporating Mamba for Speech Enhancement

Add code
May 10, 2024
Figure 1 for An Investigation of Incorporating Mamba for Speech Enhancement
Figure 2 for An Investigation of Incorporating Mamba for Speech Enhancement
Figure 3 for An Investigation of Incorporating Mamba for Speech Enhancement
Figure 4 for An Investigation of Incorporating Mamba for Speech Enhancement
Viaarxiv icon

Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities

Add code
Apr 23, 2024
Viaarxiv icon

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

Add code
Feb 10, 2024
Figure 1 for GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
Figure 2 for GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
Figure 3 for GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
Figure 4 for GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
Viaarxiv icon

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

Add code
Feb 08, 2024
Figure 1 for It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Figure 2 for It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Figure 3 for It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Figure 4 for It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Viaarxiv icon

Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech Recognition

Add code
Jan 19, 2024
Viaarxiv icon

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

Add code
Jan 19, 2024
Figure 1 for Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
Figure 2 for Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
Figure 3 for Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
Figure 4 for Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
Viaarxiv icon

Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue

Add code
Jan 17, 2024
Viaarxiv icon

Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification

Add code
Dec 22, 2023
Figure 1 for Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Figure 2 for Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Figure 3 for Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Figure 4 for Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Viaarxiv icon

Conditional Modeling Based Automatic Video Summarization

Add code
Nov 20, 2023
Figure 1 for Conditional Modeling Based Automatic Video Summarization
Figure 2 for Conditional Modeling Based Automatic Video Summarization
Figure 3 for Conditional Modeling Based Automatic Video Summarization
Figure 4 for Conditional Modeling Based Automatic Video Summarization
Viaarxiv icon

Generative error correction for code-switching speech recognition using large language models

Add code
Oct 17, 2023
Figure 1 for Generative error correction for code-switching speech recognition using large language models
Figure 2 for Generative error correction for code-switching speech recognition using large language models
Figure 3 for Generative error correction for code-switching speech recognition using large language models
Figure 4 for Generative error correction for code-switching speech recognition using large language models
Viaarxiv icon