Picture for Zhi Zhong

Zhi Zhong

Music Foundation Model as Generic Booster for Music Downstream Tasks

Add code
Nov 05, 2024
Viaarxiv icon

OpenMU: Your Swiss Army Knife for Music Understanding

Add code
Oct 21, 2024
Viaarxiv icon

VRVQ: Variable Bitrate Residual Vector Quantization for Audio Compression

Add code
Oct 12, 2024
Viaarxiv icon

Variable Bitrate Residual Vector Quantization for Audio Coding

Add code
Oct 08, 2024
Viaarxiv icon

SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond

Add code
Jun 26, 2024
Viaarxiv icon

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Add code
May 28, 2024
Viaarxiv icon

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Add code
May 23, 2024
Figure 1 for Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Figure 2 for Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Figure 3 for Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Figure 4 for Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Viaarxiv icon

On the Language Encoder of Contrastive Cross-modal Models

Add code
Oct 20, 2023
Viaarxiv icon

Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders

Add code
May 18, 2023
Viaarxiv icon

Extending Audio Masked Autoencoders Toward Audio Restoration

Add code
May 11, 2023
Viaarxiv icon