Picture for Soonshin Seo

Soonshin Seo

HyperCLOVA X Technical Report

Add code
Apr 13, 2024
Viaarxiv icon

Unified Speech-Text Pretraining for Spoken Dialog Modeling

Add code
Feb 08, 2024
Viaarxiv icon

Encoder-decoder multimodal speaker change detection

Add code
Jun 01, 2023
Figure 1 for Encoder-decoder multimodal speaker change detection
Figure 2 for Encoder-decoder multimodal speaker change detection
Figure 3 for Encoder-decoder multimodal speaker change detection
Figure 4 for Encoder-decoder multimodal speaker change detection
Viaarxiv icon

Blank Collapse: Compressing CTC emission for the faster decoding

Add code
Oct 31, 2022
Figure 1 for Blank Collapse: Compressing CTC emission for the faster decoding
Figure 2 for Blank Collapse: Compressing CTC emission for the faster decoding
Figure 3 for Blank Collapse: Compressing CTC emission for the faster decoding
Figure 4 for Blank Collapse: Compressing CTC emission for the faster decoding
Viaarxiv icon

Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Normalization for End-to-End Speaker Verification System

Add code
Jul 28, 2020
Figure 1 for Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Normalization for End-to-End Speaker Verification System
Figure 2 for Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Normalization for End-to-End Speaker Verification System
Figure 3 for Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Normalization for End-to-End Speaker Verification System
Figure 4 for Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Normalization for End-to-End Speaker Verification System
Viaarxiv icon

Masked cross self-attention encoding for deep speaker embedding

Add code
Jan 28, 2020
Figure 1 for Masked cross self-attention encoding for deep speaker embedding
Figure 2 for Masked cross self-attention encoding for deep speaker embedding
Figure 3 for Masked cross self-attention encoding for deep speaker embedding
Figure 4 for Masked cross self-attention encoding for deep speaker embedding
Viaarxiv icon