Picture for Xiaohuan Zhou

Xiaohuan Zhou

Qwen2 Technical Report

Add code
Jul 16, 2024
Figure 1 for Qwen2 Technical Report
Figure 2 for Qwen2 Technical Report
Figure 3 for Qwen2 Technical Report
Figure 4 for Qwen2 Technical Report
Viaarxiv icon

AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension

Add code
Feb 12, 2024
Viaarxiv icon

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Add code
Nov 14, 2023
Figure 1 for Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Figure 2 for Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Figure 3 for Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Figure 4 for Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Viaarxiv icon

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

Add code
Oct 11, 2023
Figure 1 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Figure 2 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Figure 3 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Figure 4 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Viaarxiv icon

Qwen Technical Report

Add code
Sep 28, 2023
Figure 1 for Qwen Technical Report
Figure 2 for Qwen Technical Report
Figure 3 for Qwen Technical Report
Figure 4 for Qwen Technical Report
Viaarxiv icon

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Add code
May 18, 2023
Figure 1 for ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Figure 2 for ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Figure 3 for ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Figure 4 for ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Viaarxiv icon

OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

Add code
Dec 08, 2022
Figure 1 for OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Figure 2 for OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Figure 3 for OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Figure 4 for OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Viaarxiv icon

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition

Add code
Nov 29, 2022
Figure 1 for MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Figure 2 for MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Figure 3 for MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Figure 4 for MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Viaarxiv icon

Contextual Expressive Text-to-Speech

Add code
Nov 26, 2022
Figure 1 for Contextual Expressive Text-to-Speech
Figure 2 for Contextual Expressive Text-to-Speech
Figure 3 for Contextual Expressive Text-to-Speech
Viaarxiv icon

Speech2Slot: An End-to-End Knowledge-based Slot Filling from Speech

Add code
May 10, 2021
Figure 1 for Speech2Slot: An End-to-End Knowledge-based Slot Filling from Speech
Figure 2 for Speech2Slot: An End-to-End Knowledge-based Slot Filling from Speech
Figure 3 for Speech2Slot: An End-to-End Knowledge-based Slot Filling from Speech
Figure 4 for Speech2Slot: An End-to-End Knowledge-based Slot Filling from Speech
Viaarxiv icon