Picture for Yuping Wang

Yuping Wang

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

Add code
Jul 05, 2024
Figure 1 for Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Figure 2 for Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Figure 3 for Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Figure 4 for Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Viaarxiv icon

Improving Audio Generation with Visual Enhanced Caption

Add code
Jul 05, 2024
Viaarxiv icon

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Add code
Jun 04, 2024
Viaarxiv icon

Open Challenges and Opportunities in Federated Foundation Models Towards Biomedical Healthcare

Add code
May 10, 2024
Figure 1 for Open Challenges and Opportunities in Federated Foundation Models Towards Biomedical Healthcare
Figure 2 for Open Challenges and Opportunities in Federated Foundation Models Towards Biomedical Healthcare
Figure 3 for Open Challenges and Opportunities in Federated Foundation Models Towards Biomedical Healthcare
Figure 4 for Open Challenges and Opportunities in Federated Foundation Models Towards Biomedical Healthcare
Viaarxiv icon

VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

Add code
Apr 11, 2024
Viaarxiv icon

CMP: Cooperative Motion Prediction with Multi-Agent Communication

Add code
Mar 26, 2024
Figure 1 for CMP: Cooperative Motion Prediction with Multi-Agent Communication
Figure 2 for CMP: Cooperative Motion Prediction with Multi-Agent Communication
Figure 3 for CMP: Cooperative Motion Prediction with Multi-Agent Communication
Figure 4 for CMP: Cooperative Motion Prediction with Multi-Agent Communication
Viaarxiv icon

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion

Add code
Feb 07, 2024
Figure 1 for StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
Figure 2 for StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
Figure 3 for StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
Figure 4 for StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
Viaarxiv icon

Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey

Add code
Dec 15, 2023
Viaarxiv icon

Novel View Synthesis from a Single RGBD Image for Indoor Scenes

Add code
Nov 02, 2023
Viaarxiv icon

EqDrive: Efficient Equivariant Motion Forecasting with Multi-Modality for Autonomous Driving

Add code
Oct 26, 2023
Figure 1 for EqDrive: Efficient Equivariant Motion Forecasting with Multi-Modality for Autonomous Driving
Figure 2 for EqDrive: Efficient Equivariant Motion Forecasting with Multi-Modality for Autonomous Driving
Figure 3 for EqDrive: Efficient Equivariant Motion Forecasting with Multi-Modality for Autonomous Driving
Viaarxiv icon