Picture for Michael Zeng

Michael Zeng

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

Add code
May 28, 2024
Viaarxiv icon

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

Add code
Apr 10, 2024
Viaarxiv icon

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Add code
Feb 12, 2024
Viaarxiv icon

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Add code
Nov 10, 2023
Figure 1 for Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Figure 2 for Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Figure 3 for Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Figure 4 for Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Viaarxiv icon

Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction

Add code
Sep 25, 2023
Figure 1 for Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction
Figure 2 for Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction
Figure 3 for Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction
Figure 4 for Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction
Viaarxiv icon

Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition

Add code
Aug 03, 2023
Figure 1 for Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition
Figure 2 for Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition
Figure 3 for Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition
Figure 4 for Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition
Viaarxiv icon

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

Add code
May 30, 2023
Figure 1 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Figure 2 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Figure 3 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Figure 4 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Viaarxiv icon

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

Add code
May 24, 2023
Figure 1 for ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Figure 2 for ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Figure 3 for ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Figure 4 for ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Viaarxiv icon

i-Code Studio: A Configurable and Composable Framework for Integrative AI

Add code
May 23, 2023
Figure 1 for i-Code Studio: A Configurable and Composable Framework for Integrative AI
Figure 2 for i-Code Studio: A Configurable and Composable Framework for Integrative AI
Figure 3 for i-Code Studio: A Configurable and Composable Framework for Integrative AI
Figure 4 for i-Code Studio: A Configurable and Composable Framework for Integrative AI
Viaarxiv icon

InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT

Add code
May 22, 2023
Figure 1 for InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT
Figure 2 for InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT
Figure 3 for InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT
Figure 4 for InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT
Viaarxiv icon