Alert button
Picture for Xun Gong

Xun Gong

Alert button

Advanced Long-Content Speech Recognition With Factorized Neural Transducer

Add code
Bookmark button
Alert button
Mar 20, 2024
Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian

Figure 1 for Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Figure 2 for Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Figure 3 for Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Figure 4 for Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Viaarxiv icon

Generating Multi-Center Classifier via Conditional Gaussian Distribution

Add code
Bookmark button
Alert button
Jan 29, 2024
Zhemin Zhang, Xun Gong

Viaarxiv icon

Vision Big Bird: Random Sparsification for Full Attention

Add code
Bookmark button
Alert button
Nov 10, 2023
Zhemin Zhang, Xun Gong

Viaarxiv icon

Adversarial Driving Behavior Generation Incorporating Human Risk Cognition for Autonomous Vehicle Evaluation

Add code
Bookmark button
Alert button
Oct 14, 2023
Zhen Liu, Hang Gao, Hao Ma, Shuo Cai, Yunfeng Hu, Ting Qu, Hong Chen, Xun Gong

Viaarxiv icon

Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR

Add code
Bookmark button
Alert button
May 18, 2023
Hang Shao, Wei Wang, Bei Liu, Xun Gong, Haoyu Wang, Yanmin Qian

Figure 1 for Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR
Figure 2 for Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR
Figure 3 for Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR
Viaarxiv icon

RSIR Transformer: Hierarchical Vision Transformer using Random Sampling Windows and Important Region Windows

Add code
Bookmark button
Alert button
Apr 27, 2023
Zhemin Zhang, Xun Gong

Figure 1 for RSIR Transformer: Hierarchical Vision Transformer using Random Sampling Windows and Important Region Windows
Figure 2 for RSIR Transformer: Hierarchical Vision Transformer using Random Sampling Windows and Important Region Windows
Figure 3 for RSIR Transformer: Hierarchical Vision Transformer using Random Sampling Windows and Important Region Windows
Figure 4 for RSIR Transformer: Hierarchical Vision Transformer using Random Sampling Windows and Important Region Windows
Viaarxiv icon

LongFNT: Long-form Speech Recognition with Factorized Neural Transducer

Add code
Bookmark button
Alert button
Nov 17, 2022
Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian

Figure 1 for LongFNT: Long-form Speech Recognition with Factorized Neural Transducer
Figure 2 for LongFNT: Long-form Speech Recognition with Factorized Neural Transducer
Figure 3 for LongFNT: Long-form Speech Recognition with Factorized Neural Transducer
Figure 4 for LongFNT: Long-form Speech Recognition with Factorized Neural Transducer
Viaarxiv icon

BoundaryFace: A mining framework with noise label self-correction for Face Recognition

Add code
Bookmark button
Alert button
Oct 10, 2022
Shijie Wu, Xun Gong

Figure 1 for BoundaryFace: A mining framework with noise label self-correction for Face Recognition
Figure 2 for BoundaryFace: A mining framework with noise label self-correction for Face Recognition
Figure 3 for BoundaryFace: A mining framework with noise label self-correction for Face Recognition
Figure 4 for BoundaryFace: A mining framework with noise label self-correction for Face Recognition
Viaarxiv icon

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

Add code
Bookmark button
Alert button
Sep 30, 2022
Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, Lirong Dai, Jinyu Li, Furu Wei

Figure 1 for SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Figure 2 for SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Figure 3 for SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Figure 4 for SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Viaarxiv icon