Alert button

"speech recognition": models, code, and papers
Alert button

FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer

Add code
Bookmark button
Alert button
Mar 21, 2024
Dongyeong Hwang, Hyunju Kim, Sunwoo Kim, Kijung Shin

Figure 1 for FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer
Figure 2 for FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer
Figure 3 for FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer
Figure 4 for FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer
Viaarxiv icon

Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition

Feb 20, 2024
David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos

Viaarxiv icon

M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset

Mar 21, 2024
Zhe Chen, Heyang Liu, Wenyi Yu, Guangzhi Sun, Hongcheng Liu, Ji Wu, Chao Zhang, Yu Wang, Yanfeng Wang

Figure 1 for M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Figure 2 for M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Figure 3 for M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Figure 4 for M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Viaarxiv icon

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

Feb 20, 2024
Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe

Viaarxiv icon

Privacy-Preserving End-to-End Spoken Language Understanding

Mar 22, 2024
Yinggui Wang, Wei Huang, Le Yang

Viaarxiv icon

Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning

Mar 20, 2024
Shivam Ratnakant Mhaskar, Nirmesh J. Shah, Mohammadi Zaki, Ashishkumar P. Gudmalwar, Pankaj Wasnik, Rajiv Ratn Shah

Viaarxiv icon

Persian Speech Emotion Recognition by Fine-Tuning Transformers

Feb 11, 2024
Minoo Shayaninasab, Bagher Babaali

Viaarxiv icon

Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation Models

Add code
Bookmark button
Alert button
Mar 18, 2024
Linus Nwankwo, Elmar Rueckert

Figure 1 for Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation Models
Figure 2 for Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation Models
Viaarxiv icon

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

Feb 08, 2024
Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Ensiong Chng, Chao-Han Huck Yang

Viaarxiv icon

BanglaAutoKG: Automatic Bangla Knowledge Graph Construction with Semantic Neural Graph Filtering

Apr 05, 2024
Azmine Toushik Wasi, Taki Hasan Rafi, Raima Islam, Dong-Kyu Chae

Viaarxiv icon