Picture for Guangzhi Sun

Guangzhi Sun

SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation

Add code
Nov 27, 2024
Viaarxiv icon

SkillAggregation: Reference-free LLM-Dependent Aggregation

Add code
Oct 14, 2024
Viaarxiv icon

Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization

Add code
Oct 09, 2024
Figure 1 for Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
Figure 2 for Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
Figure 3 for Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
Figure 4 for Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
Viaarxiv icon

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation

Add code
Sep 25, 2024
Viaarxiv icon

Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models

Add code
Sep 17, 2024
Viaarxiv icon

Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement

Add code
Sep 15, 2024
Viaarxiv icon

Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models

Add code
Aug 28, 2024
Viaarxiv icon

Speaker Adaptation for Quantised End-to-End ASR Models

Add code
Aug 07, 2024
Figure 1 for Speaker Adaptation for Quantised End-to-End ASR Models
Figure 2 for Speaker Adaptation for Quantised End-to-End ASR Models
Viaarxiv icon

SOT Triggered Neural Clustering for Speaker Attributed ASR

Add code
Jul 02, 2024
Viaarxiv icon

SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR

Add code
Jun 28, 2024
Figure 1 for SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
Figure 2 for SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
Figure 3 for SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
Figure 4 for SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
Viaarxiv icon