Alert button

"speech": models, code, and papers
Alert button

THOS: A Benchmark Dataset for Targeted Hate and Offensive Speech

Nov 11, 2023
Saad Almohaimeed, Saleh Almohaimeed, Ashfaq Ali Shafin, Bogdan Carbunar, Ladislau Bölöni

Viaarxiv icon

Optimizing Convolutional Neural Network Architecture

Dec 17, 2023
Luis Balderas, Miguel Lastra, José M. Benítez

Viaarxiv icon

Leveraging cache to enable SLU on tiny devices

Nov 30, 2023
Afsara Benazir, Zhiming Xu, Felix Xiaozhu Lin

Viaarxiv icon

Acoustic BPE for Speech Generation with Discrete Tokens

Oct 23, 2023
Feiyu Shen, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu

Viaarxiv icon

WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words

Dec 07, 2023
Lukas Wolf, Greta Tuckute, Klemen Kotar, Eghbal Hosseini, Tamar Regev, Ethan Wilcox, Alex Warstadt

Viaarxiv icon

Phoneme-aware Encoding for Prefix-tree-based Contextual ASR

Dec 15, 2023
Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, Shinji Watanabe

Viaarxiv icon

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

Oct 27, 2023
Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

Figure 1 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 2 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 3 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Figure 4 for TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Viaarxiv icon

Collaborative Learning with Artificial Intelligence Speakers (CLAIS): Pre-Service Elementary Science Teachers' Responses to the Prototype

Dec 20, 2023
Gyeong-Geon Lee, Seonyeong Mun, Myeong-Kyeong Shin, Xiaoming Zhai

Viaarxiv icon

Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes

Nov 29, 2023
Pavel Korshunov, Haolin Chen, Philip N. Garner, Sebastien Marcel

Viaarxiv icon

An Exploration of In-Context Learning for Speech Language Model

Oct 19, 2023
Ming-Hao Hsu, Kai-Wei Chang, Shang-Wen Li, Hung-yi Lee

Figure 1 for An Exploration of In-Context Learning for Speech Language Model
Figure 2 for An Exploration of In-Context Learning for Speech Language Model
Figure 3 for An Exploration of In-Context Learning for Speech Language Model
Figure 4 for An Exploration of In-Context Learning for Speech Language Model
Viaarxiv icon