Picture for Takuya Yoshioka

Takuya Yoshioka

i-Code Studio: A Configurable and Composable Framework for Integrative AI

Add code
May 23, 2023
Figure 1 for i-Code Studio: A Configurable and Composable Framework for Integrative AI
Figure 2 for i-Code Studio: A Configurable and Composable Framework for Integrative AI
Figure 3 for i-Code Studio: A Configurable and Composable Framework for Integrative AI
Figure 4 for i-Code Studio: A Configurable and Composable Framework for Integrative AI
Viaarxiv icon

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

Add code
May 21, 2023
Figure 1 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Figure 2 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Figure 3 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Figure 4 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Viaarxiv icon

Target Sound Extraction with Variable Cross-modality Clues

Add code
Mar 15, 2023
Figure 1 for Target Sound Extraction with Variable Cross-modality Clues
Figure 2 for Target Sound Extraction with Variable Cross-modality Clues
Figure 3 for Target Sound Extraction with Variable Cross-modality Clues
Figure 4 for Target Sound Extraction with Variable Cross-modality Clues
Viaarxiv icon

Factual Consistency Oriented Speech Recognition

Add code
Feb 24, 2023
Viaarxiv icon

Exploring WavLM on Speech Enhancement

Add code
Nov 18, 2022
Figure 1 for Exploring WavLM on Speech Enhancement
Figure 2 for Exploring WavLM on Speech Enhancement
Figure 3 for Exploring WavLM on Speech Enhancement
Viaarxiv icon

Simulating realistic speech overlaps improves multi-talker ASR

Add code
Nov 17, 2022
Viaarxiv icon

Real-Time Target Sound Extraction

Add code
Nov 14, 2022
Figure 1 for Real-Time Target Sound Extraction
Figure 2 for Real-Time Target Sound Extraction
Figure 3 for Real-Time Target Sound Extraction
Figure 4 for Real-Time Target Sound Extraction
Viaarxiv icon

Breaking trade-offs in speech separation with sparsely-gated mixture of experts

Add code
Nov 11, 2022
Viaarxiv icon

Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition

Add code
Nov 10, 2022
Figure 1 for Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Figure 2 for Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Figure 3 for Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Figure 4 for Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Viaarxiv icon

Speech separation with large-scale self-supervised learning

Add code
Nov 09, 2022
Viaarxiv icon