Picture for Mohan Li

Mohan Li

Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding

Add code
Jun 21, 2024
Figure 1 for Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding
Figure 2 for Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding
Figure 3 for Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding
Figure 4 for Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding
Viaarxiv icon

DiaLoc: An Iterative Approach to Embodied Dialog Localization

Add code
Mar 11, 2024
Figure 1 for DiaLoc: An Iterative Approach to Embodied Dialog Localization
Figure 2 for DiaLoc: An Iterative Approach to Embodied Dialog Localization
Figure 3 for DiaLoc: An Iterative Approach to Embodied Dialog Localization
Figure 4 for DiaLoc: An Iterative Approach to Embodied Dialog Localization
Viaarxiv icon

Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition

Add code
Apr 24, 2023
Figure 1 for Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition
Figure 2 for Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition
Figure 3 for Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition
Figure 4 for Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition
Viaarxiv icon

Non-autoregressive End-to-end Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding

Add code
Apr 21, 2023
Figure 1 for Non-autoregressive End-to-end Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding
Figure 2 for Non-autoregressive End-to-end Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding
Figure 3 for Non-autoregressive End-to-end Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding
Viaarxiv icon

Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer

Add code
Jul 29, 2022
Figure 1 for Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer
Figure 2 for Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer
Figure 3 for Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer
Figure 4 for Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer
Viaarxiv icon

Transformer-based Streaming ASR with Cumulative Attention

Add code
Mar 11, 2022
Figure 1 for Transformer-based Streaming ASR with Cumulative Attention
Figure 2 for Transformer-based Streaming ASR with Cumulative Attention
Figure 3 for Transformer-based Streaming ASR with Cumulative Attention
Figure 4 for Transformer-based Streaming ASR with Cumulative Attention
Viaarxiv icon

Head-synchronous Decoding for Transformer-based Streaming ASR

Add code
Apr 26, 2021
Figure 1 for Head-synchronous Decoding for Transformer-based Streaming ASR
Figure 2 for Head-synchronous Decoding for Transformer-based Streaming ASR
Figure 3 for Head-synchronous Decoding for Transformer-based Streaming ASR
Figure 4 for Head-synchronous Decoding for Transformer-based Streaming ASR
Viaarxiv icon

End-to-end Speech Recognition with Adaptive Computation Steps

Add code
Sep 26, 2018
Figure 1 for End-to-end Speech Recognition with Adaptive Computation Steps
Figure 2 for End-to-end Speech Recognition with Adaptive Computation Steps
Figure 3 for End-to-end Speech Recognition with Adaptive Computation Steps
Viaarxiv icon