Picture for Aren Jansen

Aren Jansen

A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation

Add code
May 22, 2024
Viaarxiv icon

Dataset balancing can hurt model performance

Add code
Jun 30, 2023
Figure 1 for Dataset balancing can hurt model performance
Figure 2 for Dataset balancing can hurt model performance
Figure 3 for Dataset balancing can hurt model performance
Figure 4 for Dataset balancing can hurt model performance
Viaarxiv icon

V2Meow: Meowing to the Visual Beat via Music Generation

Add code
May 11, 2023
Figure 1 for V2Meow: Meowing to the Visual Beat via Music Generation
Figure 2 for V2Meow: Meowing to the Visual Beat via Music Generation
Figure 3 for V2Meow: Meowing to the Visual Beat via Music Generation
Figure 4 for V2Meow: Meowing to the Visual Beat via Music Generation
Viaarxiv icon

MusicLM: Generating Music From Text

Add code
Jan 26, 2023
Figure 1 for MusicLM: Generating Music From Text
Figure 2 for MusicLM: Generating Music From Text
Figure 3 for MusicLM: Generating Music From Text
Figure 4 for MusicLM: Generating Music From Text
Viaarxiv icon

MAQA: A Multimodal QA Benchmark for Negation

Add code
Jan 09, 2023
Figure 1 for MAQA: A Multimodal QA Benchmark for Negation
Figure 2 for MAQA: A Multimodal QA Benchmark for Negation
Figure 3 for MAQA: A Multimodal QA Benchmark for Negation
Figure 4 for MAQA: A Multimodal QA Benchmark for Negation
Viaarxiv icon

MuLan: A Joint Embedding of Music Audio and Natural Language

Add code
Aug 26, 2022
Figure 1 for MuLan: A Joint Embedding of Music Audio and Natural Language
Figure 2 for MuLan: A Joint Embedding of Music Audio and Natural Language
Figure 3 for MuLan: A Joint Embedding of Music Audio and Natural Language
Figure 4 for MuLan: A Joint Embedding of Music Audio and Natural Language
Viaarxiv icon

Text-Driven Separation of Arbitrary Sounds

Add code
Apr 12, 2022
Figure 1 for Text-Driven Separation of Arbitrary Sounds
Figure 2 for Text-Driven Separation of Arbitrary Sounds
Figure 3 for Text-Driven Separation of Arbitrary Sounds
Figure 4 for Text-Driven Separation of Arbitrary Sounds
Viaarxiv icon

Universal Paralinguistic Speech Representations Using Self-Supervised Conformers

Add code
Oct 09, 2021
Figure 1 for Universal Paralinguistic Speech Representations Using Self-Supervised Conformers
Figure 2 for Universal Paralinguistic Speech Representations Using Self-Supervised Conformers
Figure 3 for Universal Paralinguistic Speech Representations Using Self-Supervised Conformers
Figure 4 for Universal Paralinguistic Speech Representations Using Self-Supervised Conformers
Viaarxiv icon

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

Add code
Oct 01, 2021
Figure 1 for BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Figure 2 for BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Figure 3 for BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Figure 4 for BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Viaarxiv icon

Attention Bottlenecks for Multimodal Fusion

Add code
Jun 30, 2021
Figure 1 for Attention Bottlenecks for Multimodal Fusion
Figure 2 for Attention Bottlenecks for Multimodal Fusion
Figure 3 for Attention Bottlenecks for Multimodal Fusion
Figure 4 for Attention Bottlenecks for Multimodal Fusion
Viaarxiv icon