Picture for Christian Fuegen

Christian Fuegen

Effective internal language model training and fusion for factorized transducer model

Add code
Apr 02, 2024
Figure 1 for Effective internal language model training and fusion for factorized transducer model
Figure 2 for Effective internal language model training and fusion for factorized transducer model
Figure 3 for Effective internal language model training and fusion for factorized transducer model
Viaarxiv icon

AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition

Add code
Jan 18, 2024
Viaarxiv icon

Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data

Add code
Nov 12, 2023
Viaarxiv icon

End-to-End Speech Recognition Contextualization with Large Language Models

Add code
Sep 19, 2023
Figure 1 for End-to-End Speech Recognition Contextualization with Large Language Models
Figure 2 for End-to-End Speech Recognition Contextualization with Large Language Models
Figure 3 for End-to-End Speech Recognition Contextualization with Large Language Models
Figure 4 for End-to-End Speech Recognition Contextualization with Large Language Models
Viaarxiv icon

Prompting Large Language Models with Speech Recognition Abilities

Add code
Jul 21, 2023
Figure 1 for Prompting Large Language Models with Speech Recognition Abilities
Figure 2 for Prompting Large Language Models with Speech Recognition Abilities
Figure 3 for Prompting Large Language Models with Speech Recognition Abilities
Figure 4 for Prompting Large Language Models with Speech Recognition Abilities
Viaarxiv icon

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

Add code
Apr 03, 2023
Figure 1 for SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Figure 2 for SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Figure 3 for SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Figure 4 for SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Viaarxiv icon

Streaming Audio-Visual Speech Recognition with Alignment Regularization

Add code
Nov 03, 2022
Figure 1 for Streaming Audio-Visual Speech Recognition with Alignment Regularization
Figure 2 for Streaming Audio-Visual Speech Recognition with Alignment Regularization
Figure 3 for Streaming Audio-Visual Speech Recognition with Alignment Regularization
Figure 4 for Streaming Audio-Visual Speech Recognition with Alignment Regularization
Viaarxiv icon

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

Add code
Apr 19, 2022
Figure 1 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Figure 2 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Figure 3 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Figure 4 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Viaarxiv icon

Scaling ASR Improves Zero and Few Shot Learning

Add code
Nov 29, 2021
Figure 1 for Scaling ASR Improves Zero and Few Shot Learning
Figure 2 for Scaling ASR Improves Zero and Few Shot Learning
Figure 3 for Scaling ASR Improves Zero and Few Shot Learning
Figure 4 for Scaling ASR Improves Zero and Few Shot Learning
Viaarxiv icon

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Add code
Oct 13, 2021
Figure 1 for Ego4D: Around the World in 3,000 Hours of Egocentric Video
Figure 2 for Ego4D: Around the World in 3,000 Hours of Egocentric Video
Figure 3 for Ego4D: Around the World in 3,000 Hours of Egocentric Video
Figure 4 for Ego4D: Around the World in 3,000 Hours of Egocentric Video
Viaarxiv icon