Alert button

"speech": models, code, and papers
Alert button

Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding

May 02, 2023
Juan Zuluaga-Gomez, Iuliia Nigmatulina, Amrutha Prasad, Petr Motlicek, Driss Khalil, Srikanth Madikeri, Allan Tart, Igor Szoke, Vincent Lenders, Mickael Rigault, Khalid Choukri

Figure 1 for Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding
Figure 2 for Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding
Figure 3 for Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding
Figure 4 for Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding
Viaarxiv icon

Language-independent speaker anonymization using orthogonal Householder neural network

May 30, 2023
Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko

Figure 1 for Language-independent speaker anonymization using orthogonal Householder neural network
Figure 2 for Language-independent speaker anonymization using orthogonal Householder neural network
Figure 3 for Language-independent speaker anonymization using orthogonal Householder neural network
Figure 4 for Language-independent speaker anonymization using orthogonal Householder neural network
Viaarxiv icon

Building Blocks for a Complex-Valued Transformer Architecture

Jun 16, 2023
Florian Eilers, Xiaoyi Jiang

Figure 1 for Building Blocks for a Complex-Valued Transformer Architecture
Figure 2 for Building Blocks for a Complex-Valued Transformer Architecture
Figure 3 for Building Blocks for a Complex-Valued Transformer Architecture
Figure 4 for Building Blocks for a Complex-Valued Transformer Architecture
Viaarxiv icon

Incorporating Deep Syntactic and Semantic Knowledge for Chinese Sequence Labeling with GCN

Jun 03, 2023
Xuemei Tang, Jun Wang, Qi Su

Figure 1 for Incorporating Deep Syntactic and Semantic Knowledge for Chinese Sequence Labeling with GCN
Figure 2 for Incorporating Deep Syntactic and Semantic Knowledge for Chinese Sequence Labeling with GCN
Figure 3 for Incorporating Deep Syntactic and Semantic Knowledge for Chinese Sequence Labeling with GCN
Figure 4 for Incorporating Deep Syntactic and Semantic Knowledge for Chinese Sequence Labeling with GCN
Viaarxiv icon

DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer

Jun 13, 2023
Goeric Huybrechts, Srikanth Ronanki, Xilai Li, Hadis Nosrati, Sravan Bodapati, Katrin Kirchhoff

Figure 1 for DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer
Figure 2 for DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer
Figure 3 for DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer
Figure 4 for DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer
Viaarxiv icon

Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis

Mar 14, 2023
Chunyu Qiang, Peng Yang, Hao Che, Ying Zhang, Xiaorui Wang, Zhongyuan Wang

Figure 1 for Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis
Figure 2 for Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis
Figure 3 for Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis
Figure 4 for Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis
Viaarxiv icon

IMaSC -- ICFOSS Malayalam Speech Corpus

Nov 23, 2022
Deepa P Gopinath, Thennal D K, Vrinda V Nair, Swaraj K S, Sachin G

Figure 1 for IMaSC -- ICFOSS Malayalam Speech Corpus
Figure 2 for IMaSC -- ICFOSS Malayalam Speech Corpus
Figure 3 for IMaSC -- ICFOSS Malayalam Speech Corpus
Figure 4 for IMaSC -- ICFOSS Malayalam Speech Corpus
Viaarxiv icon

FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs

May 18, 2023
Won Jang, Dan Lim, Heayoung Park

Figure 1 for FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs
Figure 2 for FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs
Figure 3 for FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs
Viaarxiv icon

Target Active Speaker Detection with Audio-visual Cues

May 22, 2023
Yidi Jiang, Ruijie Tao, Zexu Pan, Haizhou Li

Figure 1 for Target Active Speaker Detection with Audio-visual Cues
Figure 2 for Target Active Speaker Detection with Audio-visual Cues
Figure 3 for Target Active Speaker Detection with Audio-visual Cues
Figure 4 for Target Active Speaker Detection with Audio-visual Cues
Viaarxiv icon

Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelope

Jan 14, 2023
Yuran Zhang, Jiajie Zou, Nai Ding

Viaarxiv icon