Picture for Wei-Ning Hsu

Wei-Ning Hsu

Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

Add code
Jun 13, 2024
Figure 1 for Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
Figure 2 for Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
Figure 3 for Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
Figure 4 for Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
Viaarxiv icon

Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning

Add code
Jun 10, 2024
Figure 1 for Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning
Figure 2 for Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning
Figure 3 for Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning
Figure 4 for Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning
Viaarxiv icon

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Add code
Apr 16, 2024
Figure 1 for Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Figure 2 for Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Figure 3 for Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Figure 4 for Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Viaarxiv icon

XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception

Add code
Mar 21, 2024
Figure 1 for XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
Figure 2 for XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
Figure 3 for XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
Figure 4 for XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
Viaarxiv icon

Audiobox: Unified Audio Generation with Natural Language Prompts

Add code
Dec 25, 2023
Figure 1 for Audiobox: Unified Audio Generation with Natural Language Prompts
Figure 2 for Audiobox: Unified Audio Generation with Natural Language Prompts
Figure 3 for Audiobox: Unified Audio Generation with Natural Language Prompts
Figure 4 for Audiobox: Unified Audio Generation with Natural Language Prompts
Viaarxiv icon

Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency

Add code
Nov 05, 2023
Figure 1 for Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency
Figure 2 for Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency
Figure 3 for Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency
Viaarxiv icon

Generative Pre-training for Speech with Flow Matching

Add code
Oct 25, 2023
Figure 1 for Generative Pre-training for Speech with Flow Matching
Figure 2 for Generative Pre-training for Speech with Flow Matching
Figure 3 for Generative Pre-training for Speech with Flow Matching
Figure 4 for Generative Pre-training for Speech with Flow Matching
Viaarxiv icon

Toward Joint Language Modeling for Speech Units and Text

Add code
Oct 12, 2023
Figure 1 for Toward Joint Language Modeling for Speech Units and Text
Figure 2 for Toward Joint Language Modeling for Speech Units and Text
Figure 3 for Toward Joint Language Modeling for Speech Units and Text
Figure 4 for Toward Joint Language Modeling for Speech Units and Text
Viaarxiv icon

Low-Resource Self-Supervised Learning with SSL-Enhanced TTS

Add code
Sep 29, 2023
Figure 1 for Low-Resource Self-Supervised Learning with SSL-Enhanced TTS
Figure 2 for Low-Resource Self-Supervised Learning with SSL-Enhanced TTS
Figure 3 for Low-Resource Self-Supervised Learning with SSL-Enhanced TTS
Figure 4 for Low-Resource Self-Supervised Learning with SSL-Enhanced TTS
Viaarxiv icon

EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis

Add code
Aug 10, 2023
Figure 1 for EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis
Figure 2 for EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis
Figure 3 for EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis
Figure 4 for EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis
Viaarxiv icon