Alert button

"speech": models, code, and papers
Alert button

Critical Appraisal of Artificial Intelligence-Mediated Communication

May 15, 2023
Dara Tafazoli

Viaarxiv icon

Assessing Phrase Break of ESL speech with Pre-trained Language Models

Oct 28, 2022
Zhiyi Wang, Shaoguang Mao, Wenshan Wu, Yan Xia

Figure 1 for Assessing Phrase Break of ESL speech with Pre-trained Language Models
Figure 2 for Assessing Phrase Break of ESL speech with Pre-trained Language Models
Figure 3 for Assessing Phrase Break of ESL speech with Pre-trained Language Models
Figure 4 for Assessing Phrase Break of ESL speech with Pre-trained Language Models
Viaarxiv icon

An ASR-free Fluency Scoring Approach with Self-Supervised Learning

Mar 13, 2023
Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee

Figure 1 for An ASR-free Fluency Scoring Approach with Self-Supervised Learning
Figure 2 for An ASR-free Fluency Scoring Approach with Self-Supervised Learning
Figure 3 for An ASR-free Fluency Scoring Approach with Self-Supervised Learning
Figure 4 for An ASR-free Fluency Scoring Approach with Self-Supervised Learning
Viaarxiv icon

Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire

Add code
Bookmark button
Alert button
Nov 17, 2022
Zhiyun Fan, Zhenlin Liang, Linhao Dong, Yi Liu, Shiyu Zhou, Meng Cai, Jun Zhang, Zejun Ma, Bo Xu

Figure 1 for Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire
Figure 2 for Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire
Figure 3 for Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire
Figure 4 for Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire
Viaarxiv icon

Masked Audio Text Encoders are Effective Multi-Modal Rescorers

Add code
Bookmark button
Alert button
May 11, 2023
Jinglun Cai, Monica Sunkara, Xilai Li, Anshu Bhatia, Xiao Pan, Sravan Bodapati

Figure 1 for Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Figure 2 for Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Figure 3 for Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Figure 4 for Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Viaarxiv icon

Using LLM-assisted Annotation for Corpus Linguistics: A Case Study of Local Grammar Analysis

May 25, 2023
Danni Yu, Luyang Li, Hang Su, Matteo Fuoli

Figure 1 for Using LLM-assisted Annotation for Corpus Linguistics: A Case Study of Local Grammar Analysis
Figure 2 for Using LLM-assisted Annotation for Corpus Linguistics: A Case Study of Local Grammar Analysis
Figure 3 for Using LLM-assisted Annotation for Corpus Linguistics: A Case Study of Local Grammar Analysis
Viaarxiv icon

Improving Scheduled Sampling for Neural Transducer-based ASR

May 25, 2023
Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura

Figure 1 for Improving Scheduled Sampling for Neural Transducer-based ASR
Figure 2 for Improving Scheduled Sampling for Neural Transducer-based ASR
Figure 3 for Improving Scheduled Sampling for Neural Transducer-based ASR
Figure 4 for Improving Scheduled Sampling for Neural Transducer-based ASR
Viaarxiv icon

A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers

Add code
Bookmark button
Alert button
Apr 16, 2023
Juan Zuluaga-Gomez, Amrutha Prasad, Iuliia Nigmatulina, Petr Motlicek, Matthias Kleinert

Figure 1 for A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers
Figure 2 for A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers
Figure 3 for A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers
Figure 4 for A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers
Viaarxiv icon

Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection

Add code
Bookmark button
Alert button
May 22, 2023
Debarpan Bhattacharya, Neeraj Kumar Sharma, Debottam Dutta, Srikanth Raj Chetupalli, Pravin Mote, Sriram Ganapathy, Chandrakiran C, Sahiti Nori, Suhail K K, Sadhana Gonuguntla, Murali Alagesan

Figure 1 for Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection
Figure 2 for Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection
Figure 3 for Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection
Figure 4 for Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection
Viaarxiv icon

The Importance of Accurate Alignments in End-to-End Speech Synthesis

Add code
Bookmark button
Alert button
Oct 31, 2022
Anusha Prakash, Hema A Murthy

Figure 1 for The Importance of Accurate Alignments in End-to-End Speech Synthesis
Figure 2 for The Importance of Accurate Alignments in End-to-End Speech Synthesis
Figure 3 for The Importance of Accurate Alignments in End-to-End Speech Synthesis
Figure 4 for The Importance of Accurate Alignments in End-to-End Speech Synthesis
Viaarxiv icon