Alert button

"speech": models, code, and papers
Alert button

SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages

Mar 14, 2024
René Groh, Nina Goes, Andreas M. Kist

Figure 1 for SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages
Figure 2 for SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages
Figure 3 for SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages
Figure 4 for SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages
Viaarxiv icon

On-Device Domain Learning for Keyword Spotting on Low-Power Extreme Edge Embedded Systems

Mar 12, 2024
Cristian Cioflan, Lukas Cavigelli, Manuele Rusci, Miguel de Prado, Luca Benini

Figure 1 for On-Device Domain Learning for Keyword Spotting on Low-Power Extreme Edge Embedded Systems
Figure 2 for On-Device Domain Learning for Keyword Spotting on Low-Power Extreme Edge Embedded Systems
Figure 3 for On-Device Domain Learning for Keyword Spotting on Low-Power Extreme Edge Embedded Systems
Figure 4 for On-Device Domain Learning for Keyword Spotting on Low-Power Extreme Edge Embedded Systems
Viaarxiv icon

Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval

Mar 16, 2024
Shunsuke Tsubaki, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Keisuke Imoto

Figure 1 for Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval
Figure 2 for Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval
Figure 3 for Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval
Figure 4 for Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval
Viaarxiv icon

Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach

Mar 11, 2024
Amit Eliav, Sharon Gannot

Figure 1 for Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach
Figure 2 for Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach
Figure 3 for Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach
Figure 4 for Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach
Viaarxiv icon

Cosine Scoring with Uncertainty for Neural Speaker Embedding

Mar 11, 2024
Qiongqiong Wang, Kong Aik Lee

Figure 1 for Cosine Scoring with Uncertainty for Neural Speaker Embedding
Figure 2 for Cosine Scoring with Uncertainty for Neural Speaker Embedding
Figure 3 for Cosine Scoring with Uncertainty for Neural Speaker Embedding
Figure 4 for Cosine Scoring with Uncertainty for Neural Speaker Embedding
Viaarxiv icon

From "um" to "yeah": Producing, predicting, and regulating information flow in human conversation

Add code
Bookmark button
Alert button
Mar 13, 2024
Claire Augusta Bergey, Simon DeDeo

Figure 1 for From "um" to "yeah": Producing, predicting, and regulating information flow in human conversation
Figure 2 for From "um" to "yeah": Producing, predicting, and regulating information flow in human conversation
Figure 3 for From "um" to "yeah": Producing, predicting, and regulating information flow in human conversation
Figure 4 for From "um" to "yeah": Producing, predicting, and regulating information flow in human conversation
Viaarxiv icon

Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking

Mar 13, 2024
Ming Dong, Yujing Chen, Miao Zhang, Hao Sun, Tingting He

Figure 1 for Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking
Figure 2 for Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking
Figure 3 for Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking
Figure 4 for Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking
Viaarxiv icon

Syllable based DNN-HMM Cantonese Speech to Text System

Feb 13, 2024
Timothy Wong, Claire Li, Sam Lam, Billy Chiu, Qin Lu, Minglei Li, Dan Xiong, Roy Shing Yu, Vincent T. Y. Ng

Viaarxiv icon

When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection

Feb 17, 2024
Xiangyu Zhang, Hexin Liu, Kaishuai Xu, Qiquan Zhang, Daijiao Liu, Beena Ahmed, Julien Epps

Viaarxiv icon

CoPlay: Audio-agnostic Cognitive Scaling for Acoustic Sensing

Mar 16, 2024
Yin Li, Rajalakshmi Nanadakumar

Figure 1 for CoPlay: Audio-agnostic Cognitive Scaling for Acoustic Sensing
Figure 2 for CoPlay: Audio-agnostic Cognitive Scaling for Acoustic Sensing
Figure 3 for CoPlay: Audio-agnostic Cognitive Scaling for Acoustic Sensing
Figure 4 for CoPlay: Audio-agnostic Cognitive Scaling for Acoustic Sensing
Viaarxiv icon