Picture for Ankur Bapna

Ankur Bapna

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Multimodal Modeling For Spoken Language Identification

Add code
Sep 19, 2023
Figure 1 for Multimodal Modeling For Spoken Language Identification
Figure 2 for Multimodal Modeling For Spoken Language Identification
Figure 3 for Multimodal Modeling For Spoken Language Identification
Figure 4 for Multimodal Modeling For Spoken Language Identification
Viaarxiv icon

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Add code
Sep 09, 2023
Figure 1 for MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Figure 2 for MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Figure 3 for MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Figure 4 for MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Viaarxiv icon

AudioPaLM: A Large Language Model That Can Speak and Listen

Add code
Jun 22, 2023
Figure 1 for AudioPaLM: A Large Language Model That Can Speak and Listen
Figure 2 for AudioPaLM: A Large Language Model That Can Speak and Listen
Figure 3 for AudioPaLM: A Large Language Model That Can Speak and Listen
Figure 4 for AudioPaLM: A Large Language Model That Can Speak and Listen
Viaarxiv icon

Label Aware Speech Representation Learning For Language Identification

Add code
Jun 07, 2023
Figure 1 for Label Aware Speech Representation Learning For Language Identification
Figure 2 for Label Aware Speech Representation Learning For Language Identification
Figure 3 for Label Aware Speech Representation Learning For Language Identification
Figure 4 for Label Aware Speech Representation Learning For Language Identification
Viaarxiv icon

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

Add code
May 30, 2023
Figure 1 for LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Figure 2 for LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Figure 3 for LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Figure 4 for LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Viaarxiv icon

Understanding Shared Speech-Text Representations

Add code
Apr 27, 2023
Figure 1 for Understanding Shared Speech-Text Representations
Figure 2 for Understanding Shared Speech-Text Representations
Figure 3 for Understanding Shared Speech-Text Representations
Figure 4 for Understanding Shared Speech-Text Representations
Viaarxiv icon

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

Add code
Mar 03, 2023
Figure 1 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Figure 2 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Figure 3 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Figure 4 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Viaarxiv icon

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

Add code
Mar 03, 2023
Figure 1 for Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Figure 2 for Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Figure 3 for Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Figure 4 for Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Viaarxiv icon