Alert button
Picture for Samuel Thomas

Samuel Thomas

Alert button

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

Add code
Bookmark button
Alert button
May 21, 2023
Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogerio Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James Glass

Figure 1 for Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Figure 2 for Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Figure 3 for Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Viaarxiv icon

FisHook -- An Optimized Approach to Marine Specie Classification using MobileNetV2

Add code
Bookmark button
Alert button
Apr 04, 2023
Kohav Dey, Krishna Bajaj, K S Ramalakshmi, Samuel Thomas, Sriram Radhakrishna

Figure 1 for FisHook -- An Optimized Approach to Marine Specie Classification using MobileNetV2
Figure 2 for FisHook -- An Optimized Approach to Marine Specie Classification using MobileNetV2
Figure 3 for FisHook -- An Optimized Approach to Marine Specie Classification using MobileNetV2
Figure 4 for FisHook -- An Optimized Approach to Marine Specie Classification using MobileNetV2
Viaarxiv icon

What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

Add code
Bookmark button
Alert button
Mar 29, 2023
Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogerio Feris, James Glass, Hilde Kuehne

Figure 1 for What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Figure 2 for What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Figure 3 for What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Figure 4 for What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Viaarxiv icon

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Add code
Bookmark button
Alert button
Oct 07, 2022
Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogerio Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James Glass

Figure 1 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 2 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 3 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 4 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Viaarxiv icon

Extending RNN-T-based speech recognition systems with emotion and language classification

Add code
Bookmark button
Alert button
Jul 28, 2022
Zvi Kons, Hagai Aronowitz, Edmilson Morais, Matheus Damasceno, Hong-Kwang Kuo, Samuel Thomas, George Saon

Figure 1 for Extending RNN-T-based speech recognition systems with emotion and language classification
Figure 2 for Extending RNN-T-based speech recognition systems with emotion and language classification
Figure 3 for Extending RNN-T-based speech recognition systems with emotion and language classification
Figure 4 for Extending RNN-T-based speech recognition systems with emotion and language classification
Viaarxiv icon

Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems

Add code
Bookmark button
Alert button
Apr 11, 2022
Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas, Hong-Kwang J. Kuo, Brian Kingsbury

Figure 1 for Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems
Figure 2 for Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems
Figure 3 for Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems
Viaarxiv icon

Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding

Add code
Bookmark button
Alert button
Apr 11, 2022
Vishal Sunder, Samuel Thomas, Hong-Kwang J. Kuo, Jatin Ganhotra, Brian Kingsbury, Eric Fosler-Lussier

Figure 1 for Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding
Figure 2 for Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding
Figure 3 for Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding
Figure 4 for Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding
Viaarxiv icon

Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems

Add code
Bookmark button
Alert button
Feb 26, 2022
Samuel Thomas, Hong-Kwang J. Kuo, Brian Kingsbury, George Saon

Figure 1 for Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems
Figure 2 for Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems
Figure 3 for Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems
Figure 4 for Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems
Viaarxiv icon

Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models

Add code
Bookmark button
Alert button
Feb 26, 2022
Samuel Thomas, Brian Kingsbury, George Saon, Hong-Kwang J. Kuo

Figure 1 for Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Figure 2 for Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Figure 3 for Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Figure 4 for Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Viaarxiv icon

A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets

Add code
Bookmark button
Alert button
Feb 21, 2022
Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas, Boaz Carmeli, Ron Hoory, Brian Kingsbury

Figure 1 for A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets
Figure 2 for A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets
Figure 3 for A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets
Viaarxiv icon