Alert button

"speech": models, code, and papers
Alert button

Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach

Oct 25, 2022
Xulong Zhang, Jianzong Wang, Ning Cheng, Kexin Zhu, Jing Xiao

Figure 1 for Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach
Figure 2 for Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach
Figure 3 for Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach
Figure 4 for Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach
Viaarxiv icon

Towards generalizing deep-audio fake detection networks

May 22, 2023
Konstantin Gasenzer, Moritz Wolter

Figure 1 for Towards generalizing deep-audio fake detection networks
Figure 2 for Towards generalizing deep-audio fake detection networks
Figure 3 for Towards generalizing deep-audio fake detection networks
Figure 4 for Towards generalizing deep-audio fake detection networks
Viaarxiv icon

Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing

Add code
Bookmark button
Alert button
Nov 02, 2022
Yonggan Fu, Yang Zhang, Kaizhi Qian, Zhifan Ye, Zhongzhi Yu, Cheng-I Lai, Yingyan Lin

Figure 1 for Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Figure 2 for Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Figure 3 for Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Figure 4 for Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Viaarxiv icon

Anytime, Anywhere: Human Arm Pose from Smartwatch Data for Ubiquitous Robot Control and Teleoperation

Jun 22, 2023
Fabian C Weigend, Shubham Sonawani, Michael Drolet, Heni Ben Amor

Figure 1 for Anytime, Anywhere: Human Arm Pose from Smartwatch Data for Ubiquitous Robot Control and Teleoperation
Figure 2 for Anytime, Anywhere: Human Arm Pose from Smartwatch Data for Ubiquitous Robot Control and Teleoperation
Figure 3 for Anytime, Anywhere: Human Arm Pose from Smartwatch Data for Ubiquitous Robot Control and Teleoperation
Figure 4 for Anytime, Anywhere: Human Arm Pose from Smartwatch Data for Ubiquitous Robot Control and Teleoperation
Viaarxiv icon

Evaluation of Automated Speech Recognition Systems for Conversational Speech: A Linguistic Perspective

Nov 05, 2022
Hannaneh B. Pasandi, Haniyeh B. Pasandi

Figure 1 for Evaluation of Automated Speech Recognition Systems for Conversational Speech: A Linguistic Perspective
Figure 2 for Evaluation of Automated Speech Recognition Systems for Conversational Speech: A Linguistic Perspective
Figure 3 for Evaluation of Automated Speech Recognition Systems for Conversational Speech: A Linguistic Perspective
Figure 4 for Evaluation of Automated Speech Recognition Systems for Conversational Speech: A Linguistic Perspective
Viaarxiv icon

Generating Holistic 3D Human Motion from Speech

Dec 08, 2022
Hongwei Yi, Hualin Liang, Yifei Liu, Qiong Cao, Yandong Wen, Timo Bolkart, Dacheng Tao, Michael J. Black

Figure 1 for Generating Holistic 3D Human Motion from Speech
Figure 2 for Generating Holistic 3D Human Motion from Speech
Figure 3 for Generating Holistic 3D Human Motion from Speech
Figure 4 for Generating Holistic 3D Human Motion from Speech
Viaarxiv icon

Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis

Add code
Bookmark button
Alert button
Jun 06, 2023
Zhenhui Ye, Ziyue Jiang, Yi Ren, Jinglin Liu, Chen Zhang, Xiang Yin, Zejun Ma, Zhou Zhao

Figure 1 for Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis
Figure 2 for Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis
Figure 3 for Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis
Figure 4 for Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis
Viaarxiv icon

Multi-resolution location-based training for multi-channel continuous speech separation

Jan 16, 2023
Hassan Taherian, DeLiang Wang

Figure 1 for Multi-resolution location-based training for multi-channel continuous speech separation
Figure 2 for Multi-resolution location-based training for multi-channel continuous speech separation
Viaarxiv icon

The Double Helix inside the NLP Transformer

Jun 23, 2023
Jason H. J. Lu, Qingzhen Guo

Figure 1 for The Double Helix inside the NLP Transformer
Figure 2 for The Double Helix inside the NLP Transformer
Figure 3 for The Double Helix inside the NLP Transformer
Figure 4 for The Double Helix inside the NLP Transformer
Viaarxiv icon

Implementing contextual biasing in GPU decoder for online ASR

Add code
Bookmark button
Alert button
Jun 23, 2023
Iuliia Nigmatulina, Srikanth Madikeri, Esaú Villatoro-Tello, Petr Motliček, Juan Zuluaga-Gomez, Karthik Pandia, Aravind Ganapathiraju

Figure 1 for Implementing contextual biasing in GPU decoder for online ASR
Figure 2 for Implementing contextual biasing in GPU decoder for online ASR
Figure 3 for Implementing contextual biasing in GPU decoder for online ASR
Viaarxiv icon