Alert button

"speech": models, code, and papers
Alert button

A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement

Mar 03, 2024
Ravi Shankar, Ke Tan, Buye Xu, Anurag Kumar

Figure 1 for A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement
Figure 2 for A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement
Figure 3 for A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement
Figure 4 for A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement
Viaarxiv icon

SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation

Mar 13, 2024
Jiayu Du, Jinpeng Li, Guoguo Chen, Wei-Qiang Zhang

Viaarxiv icon

Decode Neural signal as Speech

Mar 04, 2024
Yiqian Yang, Yiqun Duan, Qiang Zhang, Renjing Xu, Hui Xiong

Figure 1 for Decode Neural signal as Speech
Figure 2 for Decode Neural signal as Speech
Figure 3 for Decode Neural signal as Speech
Figure 4 for Decode Neural signal as Speech
Viaarxiv icon

Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data

Feb 29, 2024
Takaaki Saeki, Gary Wang, Nobuyuki Morioka, Isaac Elias, Kyle Kastner, Andrew Rosenberg, Bhuvana Ramabhadran, Heiga Zen, Françoise Beaufays, Hadar Shemtov

Viaarxiv icon

Gender-ambiguous voice generation through feminine speaking style transfer in male voices

Mar 15, 2024
Maria Koutsogiannaki, Shafel Mc Dowall, Ioannis Agiomyrgiannakis

Viaarxiv icon

A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition

Mar 02, 2024
Tyler Benster, Guy Wilson, Reshef Elisha, Francis R Willett, Shaul Druckmann

Viaarxiv icon

An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning

Mar 13, 2024
Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk

Viaarxiv icon

Document Author Classification Using Parsed Language Structure

Mar 20, 2024
Todd K Moon, Jacob H. Gunther

Viaarxiv icon

NatSGD: A Dataset with Speech, Gestures, and Demonstrations for Robot Learning in Natural Human-Robot Interaction

Mar 04, 2024
Snehesh Shrestha, Yantian Zha, Saketh Banagiri, Ge Gao, Yiannis Aloimonos, Cornelia Fermuller

Figure 1 for NatSGD: A Dataset with Speech, Gestures, and Demonstrations for Robot Learning in Natural Human-Robot Interaction
Figure 2 for NatSGD: A Dataset with Speech, Gestures, and Demonstrations for Robot Learning in Natural Human-Robot Interaction
Figure 3 for NatSGD: A Dataset with Speech, Gestures, and Demonstrations for Robot Learning in Natural Human-Robot Interaction
Figure 4 for NatSGD: A Dataset with Speech, Gestures, and Demonstrations for Robot Learning in Natural Human-Robot Interaction
Viaarxiv icon

Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding

Mar 21, 2024
Jingjing Hu, Dan Guo, Kun Li, Zhan Si, Xun Yang, Xiaojun Chang, Meng Wang

Viaarxiv icon