Picture for Zhenguo Li

Zhenguo Li

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Add code
Sep 26, 2024
Figure 1 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 2 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 3 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 4 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Viaarxiv icon

CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration

Add code
Sep 17, 2024
Figure 1 for CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration
Figure 2 for CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration
Figure 3 for CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration
Figure 4 for CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration
Viaarxiv icon

T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Add code
Jul 19, 2024
Figure 1 for T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
Figure 2 for T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
Figure 3 for T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
Figure 4 for T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
Viaarxiv icon

GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing

Add code
Jul 08, 2024
Figure 1 for GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
Figure 2 for GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
Figure 3 for GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
Figure 4 for GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
Viaarxiv icon

Jailbreaking as a Reward Misspecification Problem

Add code
Jun 20, 2024
Viaarxiv icon

QuickLLaMA: Query-aware Inference Acceleration for Large Language Models

Add code
Jun 11, 2024
Viaarxiv icon

Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data

Add code
Jun 06, 2024
Viaarxiv icon

Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion

Add code
May 24, 2024
Viaarxiv icon

Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model

Add code
May 24, 2024
Viaarxiv icon

Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation

Add code
May 24, 2024
Figure 1 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 2 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 3 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 4 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Viaarxiv icon