Picture for Shuo-yiin Chang

Shuo-yiin Chang

Google

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-primary Speakers

Add code
Dec 18, 2023
Viaarxiv icon

Towards General-Purpose Text-Instruction-Guided Voice Conversion

Add code
Sep 25, 2023
Figure 1 for Towards General-Purpose Text-Instruction-Guided Voice Conversion
Figure 2 for Towards General-Purpose Text-Instruction-Guided Voice Conversion
Figure 3 for Towards General-Purpose Text-Instruction-Guided Voice Conversion
Figure 4 for Towards General-Purpose Text-Instruction-Guided Voice Conversion
Viaarxiv icon

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models

Add code
Aug 14, 2023
Figure 1 for Text Injection for Capitalization and Turn-Taking Prediction in Speech Models
Figure 2 for Text Injection for Capitalization and Turn-Taking Prediction in Speech Models
Figure 3 for Text Injection for Capitalization and Turn-Taking Prediction in Speech Models
Figure 4 for Text Injection for Capitalization and Turn-Taking Prediction in Speech Models
Viaarxiv icon

Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR

Add code
May 28, 2023
Figure 1 for Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR
Figure 2 for Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR
Figure 3 for Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR
Figure 4 for Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR
Viaarxiv icon

UML: A Universal Monolingual Output Layer for Multilingual ASR

Add code
Feb 22, 2023
Figure 1 for UML: A Universal Monolingual Output Layer for Multilingual ASR
Figure 2 for UML: A Universal Monolingual Output Layer for Multilingual ASR
Figure 3 for UML: A Universal Monolingual Output Layer for Multilingual ASR
Viaarxiv icon

Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems

Add code
Nov 01, 2022
Figure 1 for Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems
Figure 2 for Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems
Figure 3 for Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems
Figure 4 for Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems
Viaarxiv icon

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification

Add code
Sep 13, 2022
Figure 1 for Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Figure 2 for Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Figure 3 for Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Figure 4 for Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Viaarxiv icon

A Language Agnostic Multilingual Streaming On-Device ASR System

Add code
Aug 29, 2022
Figure 1 for A Language Agnostic Multilingual Streaming On-Device ASR System
Figure 2 for A Language Agnostic Multilingual Streaming On-Device ASR System
Figure 3 for A Language Agnostic Multilingual Streaming On-Device ASR System
Figure 4 for A Language Agnostic Multilingual Streaming On-Device ASR System
Viaarxiv icon