Picture for Hao Yang

Hao Yang

KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models

Add code
Jul 19, 2024
Figure 1 for KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models
Figure 2 for KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models
Figure 3 for KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models
Figure 4 for KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models
Viaarxiv icon

Temporal Label Hierachical Network for Compound Emotion Recognition

Add code
Jul 17, 2024
Viaarxiv icon

QVD: Post-training Quantization for Video Diffusion Models

Add code
Jul 16, 2024
Figure 1 for QVD: Post-training Quantization for Video Diffusion Models
Figure 2 for QVD: Post-training Quantization for Video Diffusion Models
Figure 3 for QVD: Post-training Quantization for Video Diffusion Models
Figure 4 for QVD: Post-training Quantization for Video Diffusion Models
Viaarxiv icon

Stephanie: Step-by-Step Dialogues for Mimicking Human Interactions in Social Conversations

Add code
Jul 04, 2024
Viaarxiv icon

An End-to-End Speech Summarization Using Large Language Model

Add code
Jul 02, 2024
Figure 1 for An End-to-End Speech Summarization Using Large Language Model
Figure 2 for An End-to-End Speech Summarization Using Large Language Model
Figure 3 for An End-to-End Speech Summarization Using Large Language Model
Figure 4 for An End-to-End Speech Summarization Using Large Language Model
Viaarxiv icon

DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning

Add code
Jun 25, 2024
Figure 1 for DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning
Figure 2 for DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning
Figure 3 for DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning
Figure 4 for DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning
Viaarxiv icon

Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights

Add code
Jun 25, 2024
Viaarxiv icon

Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey

Add code
Jun 12, 2024
Viaarxiv icon

Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR

Add code
Jun 07, 2024
Figure 1 for Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR
Figure 2 for Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR
Figure 3 for Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR
Figure 4 for Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR
Viaarxiv icon

Exploring the impact of traffic signal control and connected and automated vehicles on intersections safety: A deep reinforcement learning approach

Add code
May 29, 2024
Figure 1 for Exploring the impact of traffic signal control and connected and automated vehicles on intersections safety: A deep reinforcement learning approach
Figure 2 for Exploring the impact of traffic signal control and connected and automated vehicles on intersections safety: A deep reinforcement learning approach
Figure 3 for Exploring the impact of traffic signal control and connected and automated vehicles on intersections safety: A deep reinforcement learning approach
Figure 4 for Exploring the impact of traffic signal control and connected and automated vehicles on intersections safety: A deep reinforcement learning approach
Viaarxiv icon