Picture for Hao Yang

Hao Yang

What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance

Add code
Aug 23, 2024
Figure 1 for What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance
Figure 2 for What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance
Figure 3 for What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance
Figure 4 for What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance
Viaarxiv icon

NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality

Add code
Aug 18, 2024
Figure 1 for NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality
Figure 2 for NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality
Figure 3 for NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality
Figure 4 for NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality
Viaarxiv icon

KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models

Add code
Jul 19, 2024
Figure 1 for KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models
Figure 2 for KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models
Figure 3 for KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models
Figure 4 for KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models
Viaarxiv icon

Temporal Label Hierachical Network for Compound Emotion Recognition

Add code
Jul 17, 2024
Viaarxiv icon

QVD: Post-training Quantization for Video Diffusion Models

Add code
Jul 16, 2024
Viaarxiv icon

Stephanie: Step-by-Step Dialogues for Mimicking Human Interactions in Social Conversations

Add code
Jul 04, 2024
Figure 1 for Stephanie: Step-by-Step Dialogues for Mimicking Human Interactions in Social Conversations
Figure 2 for Stephanie: Step-by-Step Dialogues for Mimicking Human Interactions in Social Conversations
Figure 3 for Stephanie: Step-by-Step Dialogues for Mimicking Human Interactions in Social Conversations
Figure 4 for Stephanie: Step-by-Step Dialogues for Mimicking Human Interactions in Social Conversations
Viaarxiv icon

An End-to-End Speech Summarization Using Large Language Model

Add code
Jul 02, 2024
Figure 1 for An End-to-End Speech Summarization Using Large Language Model
Figure 2 for An End-to-End Speech Summarization Using Large Language Model
Figure 3 for An End-to-End Speech Summarization Using Large Language Model
Figure 4 for An End-to-End Speech Summarization Using Large Language Model
Viaarxiv icon

DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning

Add code
Jun 25, 2024
Viaarxiv icon

Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights

Add code
Jun 25, 2024
Viaarxiv icon

Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey

Add code
Jun 12, 2024
Viaarxiv icon