Picture for Yuexian Zou

Yuexian Zou

CAR: Controllable Autoregressive Modeling for Visual Generation

Add code
Oct 07, 2024
Viaarxiv icon

DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval

Add code
Sep 16, 2024
Viaarxiv icon

Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation

Add code
Sep 14, 2024
Viaarxiv icon

Image Conductor: Precision Control for Interactive Video Synthesis

Add code
Jun 21, 2024
Viaarxiv icon

On the Worst Prompt Performance of Large Language Models

Add code
Jun 08, 2024
Viaarxiv icon

Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning

Add code
May 31, 2024
Viaarxiv icon

VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding

Add code
Mar 22, 2024
Viaarxiv icon

VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework

Add code
Mar 14, 2024
Figure 1 for VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework
Figure 2 for VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework
Figure 3 for VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework
Figure 4 for VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework
Viaarxiv icon

WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs

Add code
Mar 10, 2024
Figure 1 for WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
Figure 2 for WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
Figure 3 for WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
Figure 4 for WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
Viaarxiv icon

Learn Suspected Anomalies from Event Prompts for Video Anomaly Detection

Add code
Mar 02, 2024
Viaarxiv icon