Picture for Xinxin Zhu

Xinxin Zhu

MM-LDM: Multi-Modal Latent Diffusion Model for Sounding Video Generation

Add code
Oct 02, 2024
Viaarxiv icon

COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation

Add code
Oct 02, 2024
Figure 1 for COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation
Figure 2 for COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation
Figure 3 for COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation
Figure 4 for COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation
Viaarxiv icon

Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

Add code
Aug 28, 2024
Figure 1 for Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression
Figure 2 for Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression
Figure 3 for Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression
Figure 4 for Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression
Viaarxiv icon

Deep Optimal Timing Strategies for Time Series

Add code
Oct 09, 2023
Figure 1 for Deep Optimal Timing Strategies for Time Series
Figure 2 for Deep Optimal Timing Strategies for Time Series
Figure 3 for Deep Optimal Timing Strategies for Time Series
Figure 4 for Deep Optimal Timing Strategies for Time Series
Viaarxiv icon

Automatic Deduction Path Learning via Reinforcement Learning with Environmental Correction

Add code
Jun 16, 2023
Figure 1 for Automatic Deduction Path Learning via Reinforcement Learning with Environmental Correction
Figure 2 for Automatic Deduction Path Learning via Reinforcement Learning with Environmental Correction
Figure 3 for Automatic Deduction Path Learning via Reinforcement Learning with Environmental Correction
Figure 4 for Automatic Deduction Path Learning via Reinforcement Learning with Environmental Correction
Viaarxiv icon

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Add code
May 29, 2023
Figure 1 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Figure 2 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Figure 3 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Figure 4 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Viaarxiv icon

ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst

Add code
May 25, 2023
Figure 1 for ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Figure 2 for ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Figure 3 for ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Figure 4 for ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Viaarxiv icon

VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

Add code
Apr 17, 2023
Viaarxiv icon

Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation

Add code
Mar 29, 2023
Figure 1 for Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation
Figure 2 for Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation
Figure 3 for Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation
Figure 4 for Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation
Viaarxiv icon

MOSO: Decomposing MOtion, Scene and Object for Video Prediction

Add code
Mar 16, 2023
Viaarxiv icon