Picture for Samuel Thomas

Samuel Thomas

Extending RNN-T-based speech recognition systems with emotion and language classification

Add code
Jul 28, 2022
Figure 1 for Extending RNN-T-based speech recognition systems with emotion and language classification
Figure 2 for Extending RNN-T-based speech recognition systems with emotion and language classification
Figure 3 for Extending RNN-T-based speech recognition systems with emotion and language classification
Figure 4 for Extending RNN-T-based speech recognition systems with emotion and language classification
Viaarxiv icon

Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems

Add code
Apr 11, 2022
Figure 1 for Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems
Figure 2 for Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems
Figure 3 for Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems
Viaarxiv icon

Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding

Add code
Apr 11, 2022
Figure 1 for Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding
Figure 2 for Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding
Figure 3 for Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding
Figure 4 for Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding
Viaarxiv icon

Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems

Add code
Feb 26, 2022
Figure 1 for Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems
Figure 2 for Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems
Figure 3 for Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems
Figure 4 for Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems
Viaarxiv icon

Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models

Add code
Feb 26, 2022
Figure 1 for Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Figure 2 for Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Figure 3 for Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Figure 4 for Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Viaarxiv icon

A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets

Add code
Feb 21, 2022
Figure 1 for A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets
Figure 2 for A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets
Figure 3 for A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets
Viaarxiv icon

Improving End-to-End Models for Set Prediction in Spoken Language Understanding

Add code
Jan 28, 2022
Figure 1 for Improving End-to-End Models for Set Prediction in Spoken Language Understanding
Figure 2 for Improving End-to-End Models for Set Prediction in Spoken Language Understanding
Figure 3 for Improving End-to-End Models for Set Prediction in Spoken Language Understanding
Figure 4 for Improving End-to-End Models for Set Prediction in Spoken Language Understanding
Viaarxiv icon

Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

Add code
Dec 08, 2021
Figure 1 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Figure 2 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Figure 3 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Figure 4 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Viaarxiv icon

Routing with Self-Attention for Multimodal Capsule Networks

Add code
Dec 01, 2021
Figure 1 for Routing with Self-Attention for Multimodal Capsule Networks
Figure 2 for Routing with Self-Attention for Multimodal Capsule Networks
Figure 3 for Routing with Self-Attention for Multimodal Capsule Networks
Figure 4 for Routing with Self-Attention for Multimodal Capsule Networks
Viaarxiv icon

Cascaded Multilingual Audio-Visual Learning from Videos

Add code
Nov 08, 2021
Figure 1 for Cascaded Multilingual Audio-Visual Learning from Videos
Figure 2 for Cascaded Multilingual Audio-Visual Learning from Videos
Figure 3 for Cascaded Multilingual Audio-Visual Learning from Videos
Figure 4 for Cascaded Multilingual Audio-Visual Learning from Videos
Viaarxiv icon