Picture for Nanxin Chen

Nanxin Chen

Text Injection for Neural Contextual Biasing

Add code
Jun 05, 2024
Viaarxiv icon

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

E3 TTS: Easy End-to-End Diffusion-based Text to Speech

Add code
Nov 02, 2023
Viaarxiv icon

SLM: Bridge the thin gap between speech and text foundation models

Add code
Sep 30, 2023
Figure 1 for SLM: Bridge the thin gap between speech and text foundation models
Figure 2 for SLM: Bridge the thin gap between speech and text foundation models
Figure 3 for SLM: Bridge the thin gap between speech and text foundation models
Figure 4 for SLM: Bridge the thin gap between speech and text foundation models
Viaarxiv icon

Efficient Adapters for Giant Speech Models

Add code
Jun 13, 2023
Figure 1 for Efficient Adapters for Giant Speech Models
Figure 2 for Efficient Adapters for Giant Speech Models
Figure 3 for Efficient Adapters for Giant Speech Models
Figure 4 for Efficient Adapters for Giant Speech Models
Viaarxiv icon

How to Estimate Model Transferability of Pre-Trained Speech Models?

Add code
Jun 01, 2023
Figure 1 for How to Estimate Model Transferability of Pre-Trained Speech Models?
Figure 2 for How to Estimate Model Transferability of Pre-Trained Speech Models?
Figure 3 for How to Estimate Model Transferability of Pre-Trained Speech Models?
Figure 4 for How to Estimate Model Transferability of Pre-Trained Speech Models?
Viaarxiv icon

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

Add code
Mar 03, 2023
Figure 1 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Figure 2 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Figure 3 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Figure 4 for Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Viaarxiv icon

Noise2Music: Text-conditioned Music Generation with Diffusion Models

Add code
Feb 08, 2023
Figure 1 for Noise2Music: Text-conditioned Music Generation with Diffusion Models
Figure 2 for Noise2Music: Text-conditioned Music Generation with Diffusion Models
Figure 3 for Noise2Music: Text-conditioned Music Generation with Diffusion Models
Figure 4 for Noise2Music: Text-conditioned Music Generation with Diffusion Models
Viaarxiv icon

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition

Add code
Jan 19, 2023
Figure 1 for From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition
Figure 2 for From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition
Figure 3 for From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition
Figure 4 for From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition
Viaarxiv icon