Alert button

"speech": models, code, and papers
Alert button

Automatic Documentation of ICD Codes with Far-Field Speech Recognition

Nov 04, 2018
Albert Haque, Corinna Fukushima

Figure 1 for Automatic Documentation of ICD Codes with Far-Field Speech Recognition
Figure 2 for Automatic Documentation of ICD Codes with Far-Field Speech Recognition
Figure 3 for Automatic Documentation of ICD Codes with Far-Field Speech Recognition
Viaarxiv icon

Towards Relatable Explainable AI with the Perceptual Process

Dec 28, 2021
Wencan Zhang, Brian Y. Lim

Figure 1 for Towards Relatable Explainable AI with the Perceptual Process
Figure 2 for Towards Relatable Explainable AI with the Perceptual Process
Figure 3 for Towards Relatable Explainable AI with the Perceptual Process
Figure 4 for Towards Relatable Explainable AI with the Perceptual Process
Viaarxiv icon

Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation

Jun 17, 2019
Siyuan Feng, Tan Lee

Figure 1 for Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation
Figure 2 for Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation
Figure 3 for Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation
Figure 4 for Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation
Viaarxiv icon

Dataset of Spatial Room Impulse Responses in a Variable Acoustics Room for Six Degrees-of-Freedom Rendering and Analysis

Nov 23, 2021
Thomas McKenzie, Leo McCormack, Christoph Hold

Figure 1 for Dataset of Spatial Room Impulse Responses in a Variable Acoustics Room for Six Degrees-of-Freedom Rendering and Analysis
Figure 2 for Dataset of Spatial Room Impulse Responses in a Variable Acoustics Room for Six Degrees-of-Freedom Rendering and Analysis
Figure 3 for Dataset of Spatial Room Impulse Responses in a Variable Acoustics Room for Six Degrees-of-Freedom Rendering and Analysis
Figure 4 for Dataset of Spatial Room Impulse Responses in a Variable Acoustics Room for Six Degrees-of-Freedom Rendering and Analysis
Viaarxiv icon

On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech

Jun 09, 2020
Balázs Tarján, György Szaszák, Tibor Fegyó, Péter Mihajlik

Figure 1 for On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech
Figure 2 for On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech
Figure 3 for On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech
Figure 4 for On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech
Viaarxiv icon

Video-to-Video Translation for Visual Speech Synthesis

May 28, 2019
Michail C. Doukas, Viktoriia Sharmanska, Stefanos Zafeiriou

Figure 1 for Video-to-Video Translation for Visual Speech Synthesis
Figure 2 for Video-to-Video Translation for Visual Speech Synthesis
Figure 3 for Video-to-Video Translation for Visual Speech Synthesis
Figure 4 for Video-to-Video Translation for Visual Speech Synthesis
Viaarxiv icon

Semantic Mask for Transformer based End-to-End Speech Recognition

Dec 06, 2019
Chengyi Wang, Yu Wu, Yujiao Du, Jinyu Li, Shujie Liu, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou

Figure 1 for Semantic Mask for Transformer based End-to-End Speech Recognition
Figure 2 for Semantic Mask for Transformer based End-to-End Speech Recognition
Figure 3 for Semantic Mask for Transformer based End-to-End Speech Recognition
Figure 4 for Semantic Mask for Transformer based End-to-End Speech Recognition
Viaarxiv icon

Responsive Listening Head Generation: A Benchmark Dataset and Baseline

Dec 27, 2021
Mohan Zhou, Yalong Bai, Wei Zhang, Tiejun Zhao, Tao Mei

Figure 1 for Responsive Listening Head Generation: A Benchmark Dataset and Baseline
Figure 2 for Responsive Listening Head Generation: A Benchmark Dataset and Baseline
Figure 3 for Responsive Listening Head Generation: A Benchmark Dataset and Baseline
Figure 4 for Responsive Listening Head Generation: A Benchmark Dataset and Baseline
Viaarxiv icon

Attention Based Fully Convolutional Network for Speech Emotion Recognition

Jun 05, 2018
Yuanyuan Zhang, Jun Du, Zirui Wang, Jianshu Zhang

Figure 1 for Attention Based Fully Convolutional Network for Speech Emotion Recognition
Figure 2 for Attention Based Fully Convolutional Network for Speech Emotion Recognition
Figure 3 for Attention Based Fully Convolutional Network for Speech Emotion Recognition
Figure 4 for Attention Based Fully Convolutional Network for Speech Emotion Recognition
Viaarxiv icon

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition

May 30, 2020
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang

Figure 1 for Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition
Figure 2 for Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition
Figure 3 for Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition
Figure 4 for Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition
Viaarxiv icon