Alert button

"speech": models, code, and papers
Alert button

Non-verbal information in spontaneous speech -- towards a new framework of analysis

Mar 06, 2024
Tirza Biron, Moshe Barboy, Eran Ben-Artzy, Alona Golubchik, Yanir Marmor, Smadar Szekely, Yaron Winter, David Harel

Figure 1 for Non-verbal information in spontaneous speech -- towards a new framework of analysis
Figure 2 for Non-verbal information in spontaneous speech -- towards a new framework of analysis
Figure 3 for Non-verbal information in spontaneous speech -- towards a new framework of analysis
Figure 4 for Non-verbal information in spontaneous speech -- towards a new framework of analysis
Viaarxiv icon

An image speaks a thousand words, but can everyone listen? On translating images for cultural relevance

Add code
Bookmark button
Alert button
Apr 01, 2024
Simran Khanuja, Sathyanarayanan Ramamoorthy, Yueqi Song, Graham Neubig

Viaarxiv icon

Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training

Apr 01, 2024
Ruijie Tao, Xinyuan Qian, Rohan Kumar Das, Xiaoxue Gao, Jiadong Wang, Haizhou Li

Viaarxiv icon

Towards Accurate Lip-to-Speech Synthesis in-the-Wild

Mar 02, 2024
Sindhu Hegde, Rudrabha Mukhopadhyay, C. V. Jawahar, Vinay Namboodiri

Figure 1 for Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Figure 2 for Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Figure 3 for Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Figure 4 for Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Viaarxiv icon

FFSTC: Fongbe to French Speech Translation Corpus

Mar 08, 2024
D. Fortune Kponou, Frejus A. A. Laleye, Eugene C. Ezin

Figure 1 for FFSTC: Fongbe to French Speech Translation Corpus
Figure 2 for FFSTC: Fongbe to French Speech Translation Corpus
Figure 3 for FFSTC: Fongbe to French Speech Translation Corpus
Viaarxiv icon

HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling

Mar 09, 2024
Chunhui Wang, Chang Zeng, Bowen Zhang, Ziyang Ma, Yefan Zhu, Zifeng Cai, Jian Zhao, Zhonglin Jiang, Yong Chen

Figure 1 for HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling
Figure 2 for HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling
Figure 3 for HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling
Figure 4 for HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling
Viaarxiv icon

Houston we have a Divergence: A Subgroup Performance Analysis of ASR Models

Mar 31, 2024
Alkis Koudounas, Flavio Giobergia

Viaarxiv icon

A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement

Mar 03, 2024
Ravi Shankar, Ke Tan, Buye Xu, Anurag Kumar

Figure 1 for A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement
Figure 2 for A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement
Figure 3 for A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement
Figure 4 for A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement
Viaarxiv icon

ChatGPT v.s. Media Bias: A Comparative Study of GPT-3.5 and Fine-tuned Language Models

Mar 29, 2024
Zehao Wen, Rabih Younes

Figure 1 for ChatGPT v.s. Media Bias: A Comparative Study of GPT-3.5 and Fine-tuned Language Models
Figure 2 for ChatGPT v.s. Media Bias: A Comparative Study of GPT-3.5 and Fine-tuned Language Models
Figure 3 for ChatGPT v.s. Media Bias: A Comparative Study of GPT-3.5 and Fine-tuned Language Models
Viaarxiv icon

Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition

Mar 13, 2024
Wenjing Zhu, Sining Sun, Changhao Shan, Peng Fan, Qing Yang

Figure 1 for Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition
Figure 2 for Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition
Figure 3 for Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition
Figure 4 for Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition
Viaarxiv icon