Picture for Akihiko Takashima

Akihiko Takashima

End-to-End Joint Target and Non-Target Speakers ASR

Add code
Jun 04, 2023
Figure 1 for End-to-End Joint Target and Non-Target Speakers ASR
Figure 2 for End-to-End Joint Target and Non-Target Speakers ASR
Figure 3 for End-to-End Joint Target and Non-Target Speakers ASR
Viaarxiv icon

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis

Add code
Oct 28, 2022
Figure 1 for On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Figure 2 for On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Figure 3 for On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Figure 4 for On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Viaarxiv icon

Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations

Add code
Feb 21, 2022
Figure 1 for Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations
Figure 2 for Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations
Figure 3 for Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations
Figure 4 for Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations
Viaarxiv icon

Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages

Add code
Nov 24, 2021
Figure 1 for Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages
Figure 2 for Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages
Figure 3 for Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages
Figure 4 for Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages
Viaarxiv icon

Hierarchical Knowledge Distillation for Dialogue Sequence Labeling

Add code
Nov 22, 2021
Figure 1 for Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Figure 2 for Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Figure 3 for Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Figure 4 for Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Viaarxiv icon

End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning

Add code
Jul 07, 2021
Figure 1 for End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning
Figure 2 for End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning
Figure 3 for End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning
Figure 4 for End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning
Viaarxiv icon

Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition

Add code
Jul 04, 2021
Figure 1 for Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
Figure 2 for Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
Figure 3 for Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
Figure 4 for Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
Viaarxiv icon

Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation

Add code
Jul 04, 2021
Figure 1 for Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation
Figure 2 for Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation
Viaarxiv icon

Enrollment-less training for personalized voice activity detection

Add code
Jun 23, 2021
Figure 1 for Enrollment-less training for personalized voice activity detection
Figure 2 for Enrollment-less training for personalized voice activity detection
Figure 3 for Enrollment-less training for personalized voice activity detection
Viaarxiv icon

Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks using Switching Tokens

Add code
Jun 23, 2021
Figure 1 for Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks using Switching Tokens
Figure 2 for Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks using Switching Tokens
Figure 3 for Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks using Switching Tokens
Viaarxiv icon