Picture for Zalán Borsos

Zalán Borsos

MusicRL: Aligning Music Generation to Human Preferences

Add code
Feb 06, 2024
Viaarxiv icon

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition

Add code
Aug 21, 2023
Figure 1 for TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition
Figure 2 for TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition
Figure 3 for TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition
Viaarxiv icon

AudioPaLM: A Large Language Model That Can Speak and Listen

Add code
Jun 22, 2023
Figure 1 for AudioPaLM: A Large Language Model That Can Speak and Listen
Figure 2 for AudioPaLM: A Large Language Model That Can Speak and Listen
Figure 3 for AudioPaLM: A Large Language Model That Can Speak and Listen
Figure 4 for AudioPaLM: A Large Language Model That Can Speak and Listen
Viaarxiv icon

SoundStorm: Efficient Parallel Audio Generation

Add code
May 16, 2023
Figure 1 for SoundStorm: Efficient Parallel Audio Generation
Figure 2 for SoundStorm: Efficient Parallel Audio Generation
Figure 3 for SoundStorm: Efficient Parallel Audio Generation
Figure 4 for SoundStorm: Efficient Parallel Audio Generation
Viaarxiv icon

LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models

Add code
Mar 23, 2023
Figure 1 for LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models
Figure 2 for LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models
Figure 3 for LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models
Figure 4 for LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models
Viaarxiv icon

Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision

Add code
Feb 07, 2023
Figure 1 for Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
Figure 2 for Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
Figure 3 for Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
Figure 4 for Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
Viaarxiv icon

MusicLM: Generating Music From Text

Add code
Jan 26, 2023
Figure 1 for MusicLM: Generating Music From Text
Figure 2 for MusicLM: Generating Music From Text
Figure 3 for MusicLM: Generating Music From Text
Figure 4 for MusicLM: Generating Music From Text
Viaarxiv icon

AudioLM: a Language Modeling Approach to Audio Generation

Add code
Sep 07, 2022
Figure 1 for AudioLM: a Language Modeling Approach to Audio Generation
Figure 2 for AudioLM: a Language Modeling Approach to Audio Generation
Figure 3 for AudioLM: a Language Modeling Approach to Audio Generation
Figure 4 for AudioLM: a Language Modeling Approach to Audio Generation
Viaarxiv icon

Disentangling speech from surroundings in a neural audio codec

Add code
Mar 29, 2022
Figure 1 for Disentangling speech from surroundings in a neural audio codec
Figure 2 for Disentangling speech from surroundings in a neural audio codec
Figure 3 for Disentangling speech from surroundings in a neural audio codec
Viaarxiv icon

SpeechPainter: Text-conditioned Speech Inpainting

Add code
Feb 15, 2022
Figure 1 for SpeechPainter: Text-conditioned Speech Inpainting
Figure 2 for SpeechPainter: Text-conditioned Speech Inpainting
Figure 3 for SpeechPainter: Text-conditioned Speech Inpainting
Viaarxiv icon