Picture for Jizhong Liu

Jizhong Liu

ACAVCaps: Enabling large-scale training for fine-grained and diverse audio understanding

Add code
Mar 25, 2026
Viaarxiv icon

DashengTokenizer: One layer is enough for unified audio understanding and generation

Add code
Feb 27, 2026
Viaarxiv icon

MiDashengLM: Efficient Audio Understanding with General Audio Captions

Add code
Aug 06, 2025
Figure 1 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Figure 2 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Figure 3 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Figure 4 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Viaarxiv icon

GLAP: General contrastive audio-text pretraining across domains and languages

Add code
Jun 12, 2025
Figure 1 for GLAP: General contrastive audio-text pretraining across domains and languages
Figure 2 for GLAP: General contrastive audio-text pretraining across domains and languages
Figure 3 for GLAP: General contrastive audio-text pretraining across domains and languages
Figure 4 for GLAP: General contrastive audio-text pretraining across domains and languages
Viaarxiv icon

Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering

Add code
Mar 17, 2025
Figure 1 for Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering
Figure 2 for Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering
Figure 3 for Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering
Figure 4 for Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering
Viaarxiv icon

Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding

Add code
Jun 19, 2024
Viaarxiv icon

Bridging Language Gaps in Audio-Text Retrieval

Add code
Jun 11, 2024
Figure 1 for Bridging Language Gaps in Audio-Text Retrieval
Figure 2 for Bridging Language Gaps in Audio-Text Retrieval
Figure 3 for Bridging Language Gaps in Audio-Text Retrieval
Figure 4 for Bridging Language Gaps in Audio-Text Retrieval
Viaarxiv icon