Picture for Haohe Liu

Haohe Liu

Audio-FLAN: A Preliminary Release

Add code
Feb 23, 2025
Figure 1 for Audio-FLAN: A Preliminary Release
Figure 2 for Audio-FLAN: A Preliminary Release
Figure 3 for Audio-FLAN: A Preliminary Release
Figure 4 for Audio-FLAN: A Preliminary Release
Viaarxiv icon

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Add code
Feb 06, 2025
Viaarxiv icon

AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models

Add code
Nov 28, 2024
Figure 1 for AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models
Figure 2 for AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models
Figure 3 for AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models
Figure 4 for AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models
Viaarxiv icon

FlowSep: Language-Queried Sound Separation with Rectified Flow Matching

Add code
Sep 11, 2024
Figure 1 for FlowSep: Language-Queried Sound Separation with Rectified Flow Matching
Figure 2 for FlowSep: Language-Queried Sound Separation with Rectified Flow Matching
Figure 3 for FlowSep: Language-Queried Sound Separation with Rectified Flow Matching
Figure 4 for FlowSep: Language-Queried Sound Separation with Rectified Flow Matching
Viaarxiv icon

Efficient Audio Captioning with Encoder-Level Knowledge Distillation

Add code
Jul 19, 2024
Figure 1 for Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Figure 2 for Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Figure 3 for Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Figure 4 for Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Viaarxiv icon

Text-Queried Target Sound Event Localization

Add code
Jun 23, 2024
Figure 1 for Text-Queried Target Sound Event Localization
Figure 2 for Text-Queried Target Sound Event Localization
Figure 3 for Text-Queried Target Sound Event Localization
Figure 4 for Text-Queried Target Sound Event Localization
Viaarxiv icon

Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Review

Add code
Jun 20, 2024
Viaarxiv icon

Zero-Shot Audio Captioning Using Soft and Hard Prompts

Add code
Jun 10, 2024
Figure 1 for Zero-Shot Audio Captioning Using Soft and Hard Prompts
Figure 2 for Zero-Shot Audio Captioning Using Soft and Hard Prompts
Figure 3 for Zero-Shot Audio Captioning Using Soft and Hard Prompts
Figure 4 for Zero-Shot Audio Captioning Using Soft and Hard Prompts
Viaarxiv icon

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

Add code
Apr 30, 2024
Viaarxiv icon

T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

Add code
Apr 27, 2024
Figure 1 for T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining
Figure 2 for T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining
Figure 3 for T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining
Figure 4 for T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining
Viaarxiv icon