Alert button

"speech": models, code, and papers
Alert button

Adversarial Example Devastation and Detection on Speech Recognition System by Adding Random Noise

Aug 31, 2021
Mingyu Dong, Diqun Yan, Yongkang Gong, Rangding Wang

Figure 1 for Adversarial Example Devastation and Detection on Speech Recognition System by Adding Random Noise
Figure 2 for Adversarial Example Devastation and Detection on Speech Recognition System by Adding Random Noise
Figure 3 for Adversarial Example Devastation and Detection on Speech Recognition System by Adding Random Noise
Figure 4 for Adversarial Example Devastation and Detection on Speech Recognition System by Adding Random Noise
Viaarxiv icon

Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation

Add code
Bookmark button
Alert button
Apr 22, 2022
Detai Xin, Shinnosuke Takamichi, Takuma Okamoto, Hisashi Kawai, Hiroshi Saruwatari

Figure 1 for Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation
Figure 2 for Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation
Figure 3 for Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation
Figure 4 for Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation
Viaarxiv icon

Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI

Add code
Bookmark button
Alert button
Dec 05, 2021
Jinchuan Tian, Jianwei Yu, Chao Weng, Shi-Xiong Zhang, Dan Su, Dong Yu, Yuexian Zou

Figure 1 for Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI
Figure 2 for Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI
Figure 3 for Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI
Figure 4 for Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI
Viaarxiv icon

Handling and Presenting Harmful Text

Add code
Bookmark button
Alert button
Apr 29, 2022
Leon Derczynski, Hannah Rose Kirk, Abeba Birhane, Bertie Vidgen

Figure 1 for Handling and Presenting Harmful Text
Viaarxiv icon

ZR-2021VG: Zero-Resource Speech Challenge, Visually-Grounded Language Modelling track, 2021 edition

Add code
Bookmark button
Alert button
Jul 14, 2021
Afra Alishahia, Grzegorz Chrupała, Alejandrina Cristia, Emmanuel Dupoux, Bertrand Higy, Marvin Lavechin, Okko Räsänen, Chen Yu

Figure 1 for ZR-2021VG: Zero-Resource Speech Challenge, Visually-Grounded Language Modelling track, 2021 edition
Viaarxiv icon

Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders

Add code
Bookmark button
Alert button
May 12, 2021
Chen Xu, Bojie Hu, Yanyang Li, Yuhao Zhang, shen huang, Qi Ju, Tong Xiao, Jingbo Zhu

Figure 1 for Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders
Figure 2 for Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders
Figure 3 for Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders
Figure 4 for Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders
Viaarxiv icon

Bi-LSTM Scoring Based Similarity Measurement with Agglomerative Hierarchical Clustering (AHC) for Speaker Diarization

May 19, 2022
Siddharth S. Nijhawan, Homayoon Beigi

Figure 1 for Bi-LSTM Scoring Based Similarity Measurement with Agglomerative Hierarchical Clustering (AHC) for Speaker Diarization
Figure 2 for Bi-LSTM Scoring Based Similarity Measurement with Agglomerative Hierarchical Clustering (AHC) for Speaker Diarization
Figure 3 for Bi-LSTM Scoring Based Similarity Measurement with Agglomerative Hierarchical Clustering (AHC) for Speaker Diarization
Figure 4 for Bi-LSTM Scoring Based Similarity Measurement with Agglomerative Hierarchical Clustering (AHC) for Speaker Diarization
Viaarxiv icon

Machine Learning based COVID-19 Detection from Smartphone Recordings: Cough, Breath and Speech

Apr 12, 2021
Madhurananda Pahar, Thomas Niesler

Figure 1 for Machine Learning based COVID-19 Detection from Smartphone Recordings: Cough, Breath and Speech
Figure 2 for Machine Learning based COVID-19 Detection from Smartphone Recordings: Cough, Breath and Speech
Figure 3 for Machine Learning based COVID-19 Detection from Smartphone Recordings: Cough, Breath and Speech
Figure 4 for Machine Learning based COVID-19 Detection from Smartphone Recordings: Cough, Breath and Speech
Viaarxiv icon

MFFCN: Multi-layer Feature Fusion Convolution Network for Audio-visual Speech Enhancement

Feb 04, 2021
Xinmeng Xu, Yang Wang, Dongxiang Xu, Yiyuan Peng, Cong Zhang, Jie Jia, Yang Wang, Binbin Chen

Figure 1 for MFFCN: Multi-layer Feature Fusion Convolution Network for Audio-visual Speech Enhancement
Figure 2 for MFFCN: Multi-layer Feature Fusion Convolution Network for Audio-visual Speech Enhancement
Figure 3 for MFFCN: Multi-layer Feature Fusion Convolution Network for Audio-visual Speech Enhancement
Figure 4 for MFFCN: Multi-layer Feature Fusion Convolution Network for Audio-visual Speech Enhancement
Viaarxiv icon

When can I Speak? Predicting initiation points for spoken dialogue agents

Add code
Bookmark button
Alert button
Aug 07, 2022
Siyan Li, Ashwin Paranjape, Christopher D. Manning

Figure 1 for When can I Speak? Predicting initiation points for spoken dialogue agents
Figure 2 for When can I Speak? Predicting initiation points for spoken dialogue agents
Figure 3 for When can I Speak? Predicting initiation points for spoken dialogue agents
Figure 4 for When can I Speak? Predicting initiation points for spoken dialogue agents
Viaarxiv icon