Picture for Taejin Park

Taejin Park

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Add code
Sep 17, 2024
Figure 1 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 2 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 3 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 4 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Viaarxiv icon

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens

Add code
Sep 10, 2024
Figure 1 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Figure 2 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Figure 3 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Figure 4 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Viaarxiv icon

Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR

Add code
Sep 02, 2024
Viaarxiv icon

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks

Add code
Aug 23, 2024
Viaarxiv icon

The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization

Add code
Jul 23, 2024
Viaarxiv icon

TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context

Add code
Oct 08, 2021
Figure 1 for TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context
Figure 2 for TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context
Figure 3 for TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context
Figure 4 for TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context
Viaarxiv icon

Robust Multi-channel Speech Recognition using Frequency Aligned Network

Add code
Feb 06, 2020
Figure 1 for Robust Multi-channel Speech Recognition using Frequency Aligned Network
Figure 2 for Robust Multi-channel Speech Recognition using Frequency Aligned Network
Figure 3 for Robust Multi-channel Speech Recognition using Frequency Aligned Network
Figure 4 for Robust Multi-channel Speech Recognition using Frequency Aligned Network
Viaarxiv icon