Alert button

"speech": models, code, and papers
Alert button

TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

Sep 05, 2023
Yuan Shangguan, Haichuan Yang, Danni Li, Chunyang Wu, Yassir Fathullah, Dilin Wang, Ayushi Dalmia, Raghuraman Krishnamoorthi, Ozlem Kalinli, Junteng Jia, Jay Mahadeokar, Xin Lei, Mike Seltzer, Vikas Chandra

Figure 1 for TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models
Figure 2 for TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models
Figure 3 for TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models
Figure 4 for TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models
Viaarxiv icon

The DKU-MSXF Diarization System for the VoxCeleb Speaker Recognition Challenge 2023

Aug 17, 2023
Ming Cheng, Weiqing Wang, Xiaoyi Qin, Yuke Lin, Ning Jiang, Guoqing Zhao, Ming Li

Figure 1 for The DKU-MSXF Diarization System for the VoxCeleb Speaker Recognition Challenge 2023
Figure 2 for The DKU-MSXF Diarization System for the VoxCeleb Speaker Recognition Challenge 2023
Figure 3 for The DKU-MSXF Diarization System for the VoxCeleb Speaker Recognition Challenge 2023
Figure 4 for The DKU-MSXF Diarization System for the VoxCeleb Speaker Recognition Challenge 2023
Viaarxiv icon

Contrastive Speech Mixup for Low-resource Keyword Spotting

May 02, 2023
Dianwen Ng, Ruixi Zhang, Jia Qi Yip, Chong Zhang, Yukun Ma, Trung Hieu Nguyen, Chongjia Ni, Eng Siong Chng, Bin Ma

Figure 1 for Contrastive Speech Mixup for Low-resource Keyword Spotting
Figure 2 for Contrastive Speech Mixup for Low-resource Keyword Spotting
Figure 3 for Contrastive Speech Mixup for Low-resource Keyword Spotting
Figure 4 for Contrastive Speech Mixup for Low-resource Keyword Spotting
Viaarxiv icon

On Monotonic Aggregation for Open-domain QA

Add code
Bookmark button
Alert button
Aug 08, 2023
Sang-eun Han, Yeonseok Jeong, Seung-won Hwang, Kyungjae Lee

Figure 1 for On Monotonic Aggregation for Open-domain QA
Figure 2 for On Monotonic Aggregation for Open-domain QA
Figure 3 for On Monotonic Aggregation for Open-domain QA
Figure 4 for On Monotonic Aggregation for Open-domain QA
Viaarxiv icon

Mitigating Negative Transfer with Task Awareness for Sexism, Hate Speech, and Toxic Language Detection

Add code
Bookmark button
Alert button
Jul 07, 2023
Angel Felipe Magnossão de Paula, Paolo Rosso, Damiano Spina

Figure 1 for Mitigating Negative Transfer with Task Awareness for Sexism, Hate Speech, and Toxic Language Detection
Figure 2 for Mitigating Negative Transfer with Task Awareness for Sexism, Hate Speech, and Toxic Language Detection
Figure 3 for Mitigating Negative Transfer with Task Awareness for Sexism, Hate Speech, and Toxic Language Detection
Figure 4 for Mitigating Negative Transfer with Task Awareness for Sexism, Hate Speech, and Toxic Language Detection
Viaarxiv icon

VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer

Aug 09, 2023
Liyang Chen, Zhiyong Wu, Runnan Li, Weihong Bao, Jun Ling, Xu Tan, Sheng Zhao

Figure 1 for VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
Figure 2 for VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
Figure 3 for VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
Figure 4 for VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
Viaarxiv icon

Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition

Add code
Bookmark button
Alert button
May 26, 2023
Hong Liu, Zhaobiao Lv, Zhijian Ou, Wenbo Zhao, Qing Xiao

Figure 1 for Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition
Figure 2 for Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition
Figure 3 for Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition
Figure 4 for Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition
Viaarxiv icon

An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings

Add code
Bookmark button
Alert button
May 29, 2023
Luca Serafini, Samuele Cornell, Giovanni Morrone, Enrico Zovato, Alessio Brutti, Stefano Squartini

Figure 1 for An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings
Figure 2 for An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings
Figure 3 for An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings
Figure 4 for An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings
Viaarxiv icon

Code-Switched Urdu ASR for Noisy Telephonic Environment using Data Centric Approach with Hybrid HMM and CNN-TDNN

Jul 24, 2023
Muhammad Danyal Khan, Raheem Ali, Arshad Aziz

Viaarxiv icon

Contextual Biasing of Named-Entities with Large Language Models

Sep 01, 2023
Chuanneng Sun, Zeeshan Ahmed, Yingyi Ma, Zhe Liu, Yutong Pang, Ozlem Kalinli

Figure 1 for Contextual Biasing of Named-Entities with Large Language Models
Figure 2 for Contextual Biasing of Named-Entities with Large Language Models
Figure 3 for Contextual Biasing of Named-Entities with Large Language Models
Figure 4 for Contextual Biasing of Named-Entities with Large Language Models
Viaarxiv icon