Picture for Jianqing Gao

Jianqing Gao

Adapting Speech Foundation Models with Large Language Models for Unified Speech Recognition

Add code
Oct 27, 2025
Viaarxiv icon

Improving Anomalous Sound Detection with Attribute-aware Representation from Domain-adaptive Pre-training

Add code
Sep 16, 2025
Viaarxiv icon

Hallucination as a Computational Boundary: A Hierarchy of Inevitability and the Oracle Escape

Add code
Aug 10, 2025
Viaarxiv icon

Dual form Complementary Masking for Domain-Adaptive Image Segmentation

Add code
Jul 16, 2025
Viaarxiv icon

Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration

Add code
Apr 17, 2025
Viaarxiv icon

Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models

Add code
Feb 09, 2025
Viaarxiv icon

Latent Swap Joint Diffusion for Long-Form Audio Generation

Add code
Feb 07, 2025
Figure 1 for Latent Swap Joint Diffusion for Long-Form Audio Generation
Figure 2 for Latent Swap Joint Diffusion for Long-Form Audio Generation
Figure 3 for Latent Swap Joint Diffusion for Long-Form Audio Generation
Figure 4 for Latent Swap Joint Diffusion for Long-Form Audio Generation
Viaarxiv icon

Deep CLAS: Deep Contextual Listen, Attend and Spell

Add code
Sep 26, 2024
Figure 1 for Deep CLAS: Deep Contextual Listen, Attend and Spell
Figure 2 for Deep CLAS: Deep Contextual Listen, Attend and Spell
Figure 3 for Deep CLAS: Deep Contextual Listen, Attend and Spell
Figure 4 for Deep CLAS: Deep Contextual Listen, Attend and Spell
Viaarxiv icon

The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge

Add code
Sep 03, 2024
Figure 1 for The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
Figure 2 for The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
Figure 3 for The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
Figure 4 for The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
Viaarxiv icon

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

Add code
Sep 15, 2023
Figure 1 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 2 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 3 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 4 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Viaarxiv icon