Alert button
Picture for Jun Du

Jun Du

Alert button

Multitask frame-level learning for few-shot sound event detection

Add code
Bookmark button
Alert button
Mar 17, 2024
Liang Zou, Genwei Yan, Ruoyu Wang, Jun Du, Meng Lei, Tian Gao, Xin Fang

Figure 1 for Multitask frame-level learning for few-shot sound event detection
Figure 2 for Multitask frame-level learning for few-shot sound event detection
Figure 3 for Multitask frame-level learning for few-shot sound event detection
Figure 4 for Multitask frame-level learning for few-shot sound event detection
Viaarxiv icon

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

Add code
Bookmark button
Alert button
Mar 07, 2024
Yusheng Dai, Hang Chen, Jun Du, Ruoyu Wang, Shihao Chen, Jiefeng Ma, Haotian Wang, Chin-Hui Lee

Figure 1 for A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
Figure 2 for A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
Figure 3 for A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
Figure 4 for A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
Viaarxiv icon

Bidirectional Trained Tree-Structured Decoder for Handwritten Mathematical Expression Recognition

Add code
Bookmark button
Alert button
Dec 31, 2023
Hanbo Cheng, Chenyu Liu, Pengfei Hu, Zhenrong Zhang, Jiefeng Ma, Jun Du

Viaarxiv icon

CDSD: Chinese Dysarthria Speech Database

Add code
Bookmark button
Alert button
Oct 24, 2023
Mengyi Sun, Ming Gao, Xinchen Kang, Shiru Wang, Jun Du, Dengfeng Yao, Su-Jing Wang

Viaarxiv icon

Continuous Modeling of the Denoising Process for Speech Enhancement Based on Deep Learning

Add code
Bookmark button
Alert button
Sep 17, 2023
Zilu Guo, Jun Du, CHin-Hui Lee

Figure 1 for Continuous Modeling of the Denoising Process for Speech Enhancement Based on Deep Learning
Figure 2 for Continuous Modeling of the Denoising Process for Speech Enhancement Based on Deep Learning
Figure 3 for Continuous Modeling of the Denoising Process for Speech Enhancement Based on Deep Learning
Figure 4 for Continuous Modeling of the Denoising Process for Speech Enhancement Based on Deep Learning
Viaarxiv icon

Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture

Add code
Bookmark button
Alert button
Sep 17, 2023
Gaobin Yang, Maokui He, Shutong Niu, Ruoyu Wang, Yanyan Yue, Shuangqing Qian, Shilong Wu, Jun Du, Chin-Hui Lee

Figure 1 for Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture
Figure 2 for Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture
Figure 3 for Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture
Figure 4 for Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture
Viaarxiv icon

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

Add code
Bookmark button
Alert button
Sep 15, 2023
Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao

Figure 1 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 2 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 3 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 4 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Viaarxiv icon

Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023

Add code
Bookmark button
Alert button
Sep 11, 2023
Haotian Wang, Yuxuan Xi, Hang Chen, Jun Du, Yan Song, Qing Wang, Hengshun Zhou, Chenxi Wang, Jiefeng Ma, Pengfei Hu, Ya Jiang, Shi Cheng, Jie Zhang, Yuzhe Weng

Figure 1 for Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023
Figure 2 for Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023
Figure 3 for Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023
Figure 4 for Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023
Viaarxiv icon

The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge

Add code
Bookmark button
Alert button
Aug 28, 2023
Ruoyu Wang, Maokui He, Jun Du, Hengshun Zhou, Shutong Niu, Hang Chen, Yanyan Yue, Gaobin Yang, Shilong Wu, Lei Sun, Yanhui Tu, Haitao Tang, Shuangqing Qian, Tian Gao, Mengzhi Wang, Genshun Wan, Jia Pan, Jianqing Gao, Chin-Hui Lee

Figure 1 for The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge
Figure 2 for The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge
Figure 3 for The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge
Figure 4 for The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge
Viaarxiv icon

Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder

Add code
Bookmark button
Alert button
Aug 14, 2023
Yusheng Dai, Hang Chen, Jun Du, Xiaofei Ding, Ning Ding, Feijun Jiang, Chin-Hui Lee

Figure 1 for Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
Figure 2 for Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
Figure 3 for Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
Figure 4 for Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
Viaarxiv icon