Picture for Christian Fuegen

Christian Fuegen

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

Add code
Apr 03, 2023
Viaarxiv icon

Streaming Audio-Visual Speech Recognition with Alignment Regularization

Add code
Nov 03, 2022
Viaarxiv icon

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

Add code
Apr 19, 2022
Figure 1 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Figure 2 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Figure 3 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Figure 4 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Viaarxiv icon

Scaling ASR Improves Zero and Few Shot Learning

Add code
Nov 29, 2021
Figure 1 for Scaling ASR Improves Zero and Few Shot Learning
Figure 2 for Scaling ASR Improves Zero and Few Shot Learning
Figure 3 for Scaling ASR Improves Zero and Few Shot Learning
Figure 4 for Scaling ASR Improves Zero and Few Shot Learning
Viaarxiv icon

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Add code
Oct 13, 2021
Figure 1 for Ego4D: Around the World in 3,000 Hours of Egocentric Video
Figure 2 for Ego4D: Around the World in 3,000 Hours of Egocentric Video
Figure 3 for Ego4D: Around the World in 3,000 Hours of Egocentric Video
Figure 4 for Ego4D: Around the World in 3,000 Hours of Egocentric Video
Viaarxiv icon

Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric

Add code
Oct 11, 2021
Figure 1 for Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric
Figure 2 for Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric
Figure 3 for Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric
Figure 4 for Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric
Viaarxiv icon

Do sound event representations generalize to other audio tasks? A case study in audio transfer learning

Add code
Jun 21, 2021
Figure 1 for Do sound event representations generalize to other audio tasks? A case study in audio transfer learning
Figure 2 for Do sound event representations generalize to other audio tasks? A case study in audio transfer learning
Figure 3 for Do sound event representations generalize to other audio tasks? A case study in audio transfer learning
Figure 4 for Do sound event representations generalize to other audio tasks? A case study in audio transfer learning
Viaarxiv icon

Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios

Add code
Apr 06, 2021
Figure 1 for Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios
Figure 2 for Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios
Figure 3 for Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios
Figure 4 for Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios
Viaarxiv icon

Dissecting User-Perceived Latency of On-Device E2E Speech Recognition

Add code
Apr 06, 2021
Figure 1 for Dissecting User-Perceived Latency of On-Device E2E Speech Recognition
Figure 2 for Dissecting User-Perceived Latency of On-Device E2E Speech Recognition
Figure 3 for Dissecting User-Perceived Latency of On-Device E2E Speech Recognition
Figure 4 for Dissecting User-Perceived Latency of On-Device E2E Speech Recognition
Viaarxiv icon

Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion

Add code
Apr 05, 2021
Figure 1 for Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion
Figure 2 for Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion
Figure 3 for Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion
Figure 4 for Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion
Viaarxiv icon