Picture for Olivier Siohan

Olivier Siohan

Google Inc

Revisiting the Entropy Semiring for Neural Speech Recognition

Add code
Dec 19, 2023
Viaarxiv icon

On Robustness to Missing Video for Audiovisual Speech Recognition

Add code
Dec 19, 2023
Figure 1 for On Robustness to Missing Video for Audiovisual Speech Recognition
Figure 2 for On Robustness to Missing Video for Audiovisual Speech Recognition
Figure 3 for On Robustness to Missing Video for Audiovisual Speech Recognition
Figure 4 for On Robustness to Missing Video for Audiovisual Speech Recognition
Viaarxiv icon

Audio-visual fine-tuning of audio-only ASR models

Add code
Dec 14, 2023
Viaarxiv icon

Cascaded encoders for fine-tuning ASR models on overlapped speech

Add code
Jun 28, 2023
Viaarxiv icon

Conformers are All You Need for Visual Speech Recogntion

Add code
Feb 17, 2023
Viaarxiv icon

End-to-End Multi-Person Audio/Visual Automatic Speech Recognition

Add code
May 11, 2022
Figure 1 for End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Figure 2 for End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Figure 3 for End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Figure 4 for End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Viaarxiv icon

A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection

Add code
May 11, 2022
Figure 1 for A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
Figure 2 for A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
Figure 3 for A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
Figure 4 for A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
Viaarxiv icon

Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection

Add code
May 10, 2022
Figure 1 for Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Figure 2 for Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Figure 3 for Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Viaarxiv icon

End-to-end multi-talker audio-visual ASR using an active speaker attention module

Add code
Apr 01, 2022
Figure 1 for End-to-end multi-talker audio-visual ASR using an active speaker attention module
Figure 2 for End-to-end multi-talker audio-visual ASR using an active speaker attention module
Figure 3 for End-to-end multi-talker audio-visual ASR using an active speaker attention module
Figure 4 for End-to-end multi-talker audio-visual ASR using an active speaker attention module
Viaarxiv icon

Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition

Add code
Jan 25, 2022
Figure 1 for Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
Figure 2 for Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
Figure 3 for Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
Figure 4 for Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
Viaarxiv icon