Picture for Guangzhi Sun

Guangzhi Sun

Measuring the Redundancy of Decoder Layers in SpeechLLMs

Add code
Mar 05, 2026
Viaarxiv icon

Scaling Open Discrete Audio Foundation Models with Interleaved Semantic, Acoustic, and Text Tokens

Add code
Feb 18, 2026
Viaarxiv icon

Who can we trust? LLM-as-a-jury for Comparative Assessment

Add code
Feb 18, 2026
Viaarxiv icon

OCR-Enhanced Multimodal ASR Can Read While Listening

Add code
Jan 26, 2026
Viaarxiv icon

Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard

Add code
Nov 14, 2025
Figure 1 for Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard
Figure 2 for Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard
Figure 3 for Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard
Figure 4 for Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard
Viaarxiv icon

video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models

Add code
Jun 18, 2025
Viaarxiv icon

Unlearning vs. Obfuscation: Are We Truly Removing Knowledge?

Add code
May 05, 2025
Viaarxiv icon

ACVUBench: Audio-Centric Video Understanding Benchmark

Add code
Mar 25, 2025
Figure 1 for ACVUBench: Audio-Centric Video Understanding Benchmark
Figure 2 for ACVUBench: Audio-Centric Video Understanding Benchmark
Figure 3 for ACVUBench: Audio-Centric Video Understanding Benchmark
Figure 4 for ACVUBench: Audio-Centric Video Understanding Benchmark
Viaarxiv icon

Improving LLM Video Understanding with 16 Frames Per Second

Add code
Mar 18, 2025
Figure 1 for Improving LLM Video Understanding with 16 Frames Per Second
Figure 2 for Improving LLM Video Understanding with 16 Frames Per Second
Figure 3 for Improving LLM Video Understanding with 16 Frames Per Second
Figure 4 for Improving LLM Video Understanding with 16 Frames Per Second
Viaarxiv icon

Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation

Add code
Feb 26, 2025
Viaarxiv icon