Alert button
Picture for Jun Du

Jun Du

Alert button

A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification

Add code
Bookmark button
Alert button
Mar 07, 2022
Qing Wang, Jun Du, Siyuan Zheng, Yunqing Li, Yajian Wang, Yuzhong Wu, Hu Hu, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee

Figure 1 for A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification
Figure 2 for A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification
Figure 3 for A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification
Figure 4 for A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification
Viaarxiv icon

A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning

Add code
Bookmark button
Alert button
Feb 17, 2022
Hengshun Zhou, Jun Du, Chao-Han Huck Yang, Shifu Xiong, Chin-Hui Lee

Figure 1 for A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning
Figure 2 for A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning
Figure 3 for A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning
Figure 4 for A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning
Viaarxiv icon

The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge

Add code
Bookmark button
Alert button
Feb 10, 2022
Maokui He, Xiang Lv, Weilin Zhou, JingJing Yin, Xiaoqi Zhang, Yuxuan Wang, Shutong Niu, Yuhang Cao, Heng Lu, Jun Du, Chin-Hui Lee

Figure 1 for The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge
Figure 2 for The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge
Figure 3 for The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge
Figure 4 for The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge
Viaarxiv icon

Information Fusion in Attention Networks Using Adaptive and Multi-level Factorized Bilinear Pooling for Audio-visual Emotion Recognition

Add code
Bookmark button
Alert button
Nov 17, 2021
Hengshun Zhou, Jun Du, Yuanyuan Zhang, Qing Wang, Qing-Feng Liu, Chin-Hui Lee

Figure 1 for Information Fusion in Attention Networks Using Adaptive and Multi-level Factorized Bilinear Pooling for Audio-visual Emotion Recognition
Figure 2 for Information Fusion in Attention Networks Using Adaptive and Multi-level Factorized Bilinear Pooling for Audio-visual Emotion Recognition
Figure 3 for Information Fusion in Attention Networks Using Adaptive and Multi-level Factorized Bilinear Pooling for Audio-visual Emotion Recognition
Figure 4 for Information Fusion in Attention Networks Using Adaptive and Multi-level Factorized Bilinear Pooling for Audio-visual Emotion Recognition
Viaarxiv icon

Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker

Add code
Bookmark button
Alert button
Aug 07, 2021
Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe

Figure 1 for Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
Figure 2 for Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
Figure 3 for Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
Figure 4 for Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
Viaarxiv icon

Split, embed and merge: An accurate table structure recognizer

Add code
Bookmark button
Alert button
Jul 20, 2021
Zhenrong Zhang, Jianshu Zhang, Jun Du

Figure 1 for Split, embed and merge: An accurate table structure recognizer
Figure 2 for Split, embed and merge: An accurate table structure recognizer
Figure 3 for Split, embed and merge: An accurate table structure recognizer
Figure 4 for Split, embed and merge: An accurate table structure recognizer
Viaarxiv icon

Separation Guided Speaker Diarization in Realistic Mismatched Conditions

Add code
Bookmark button
Alert button
Jul 06, 2021
Shu-Tong Niu, Jun Du, Lei Sun, Chin-Hui Lee

Figure 1 for Separation Guided Speaker Diarization in Realistic Mismatched Conditions
Figure 2 for Separation Guided Speaker Diarization in Realistic Mismatched Conditions
Figure 3 for Separation Guided Speaker Diarization in Realistic Mismatched Conditions
Figure 4 for Separation Guided Speaker Diarization in Realistic Mismatched Conditions
Viaarxiv icon

A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification

Add code
Bookmark button
Alert button
Jul 03, 2021
Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Qing Wang, Yuyang Wang, Xianjun Xia, Yuanjun Zhao, Yuzhong Wu, Yannan Wang, Jun Du, Chin-Hui Lee

Figure 1 for A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification
Figure 2 for A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification
Figure 3 for A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification
Figure 4 for A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification
Viaarxiv icon

AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario

Add code
Bookmark button
Alert button
Apr 08, 2021
Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen, Yanxin Hu, Lei Xie, Jian Wu, Hui Bu, Xin Xu, Jun Du, Jingdong Chen

Figure 1 for AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario
Figure 2 for AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario
Figure 3 for AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario
Figure 4 for AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario
Viaarxiv icon