Picture for Shota Orihashi

Shota Orihashi

Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations

Add code
Feb 21, 2022
Figure 1 for Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations
Figure 2 for Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations
Figure 3 for Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations
Figure 4 for Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations
Viaarxiv icon

Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages

Add code
Nov 24, 2021
Figure 1 for Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages
Figure 2 for Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages
Figure 3 for Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages
Figure 4 for Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages
Viaarxiv icon

Hierarchical Knowledge Distillation for Dialogue Sequence Labeling

Add code
Nov 22, 2021
Figure 1 for Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Figure 2 for Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Figure 3 for Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Figure 4 for Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Viaarxiv icon

End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning

Add code
Jul 07, 2021
Figure 1 for End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning
Figure 2 for End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning
Figure 3 for End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning
Figure 4 for End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning
Viaarxiv icon

Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition

Add code
Jul 04, 2021
Figure 1 for Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
Figure 2 for Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
Figure 3 for Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
Figure 4 for Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
Viaarxiv icon

Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation

Add code
Jul 04, 2021
Figure 1 for Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation
Figure 2 for Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation
Viaarxiv icon

Enrollment-less training for personalized voice activity detection

Add code
Jun 23, 2021
Figure 1 for Enrollment-less training for personalized voice activity detection
Figure 2 for Enrollment-less training for personalized voice activity detection
Figure 3 for Enrollment-less training for personalized voice activity detection
Viaarxiv icon

Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks using Switching Tokens

Add code
Jun 23, 2021
Figure 1 for Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks using Switching Tokens
Figure 2 for Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks using Switching Tokens
Figure 3 for Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks using Switching Tokens
Viaarxiv icon

Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss

Add code
Mar 02, 2021
Figure 1 for Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss
Figure 2 for Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss
Figure 3 for Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss
Figure 4 for Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss
Viaarxiv icon

Large-Context Conversational Representation Learning: Self-Supervised Learning for Conversational Documents

Add code
Feb 16, 2021
Figure 1 for Large-Context Conversational Representation Learning: Self-Supervised Learning for Conversational Documents
Figure 2 for Large-Context Conversational Representation Learning: Self-Supervised Learning for Conversational Documents
Figure 3 for Large-Context Conversational Representation Learning: Self-Supervised Learning for Conversational Documents
Figure 4 for Large-Context Conversational Representation Learning: Self-Supervised Learning for Conversational Documents
Viaarxiv icon