Picture for Xiuye Gu

Xiuye Gu

VoCap: Video Object Captioning and Segmentation from Any Prompt

Add code
Aug 29, 2025
Viaarxiv icon

Language-Guided Image Tokenization for Generation

Add code
Dec 08, 2024
Figure 1 for Language-Guided Image Tokenization for Generation
Figure 2 for Language-Guided Image Tokenization for Generation
Figure 3 for Language-Guided Image Tokenization for Generation
Figure 4 for Language-Guided Image Tokenization for Generation
Viaarxiv icon

CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor

Add code
Dec 21, 2023
Figure 1 for CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Figure 2 for CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Figure 3 for CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Figure 4 for CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Viaarxiv icon

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Add code
Dec 21, 2023
Figure 1 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Figure 2 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Figure 3 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Figure 4 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Viaarxiv icon

Pixel Aligned Language Models

Add code
Dec 14, 2023
Viaarxiv icon

Photorealistic Video Generation with Diffusion Models

Add code
Dec 11, 2023
Viaarxiv icon

PolyMaX: General Dense Prediction with Mask Transformer

Add code
Nov 09, 2023
Figure 1 for PolyMaX: General Dense Prediction with Mask Transformer
Figure 2 for PolyMaX: General Dense Prediction with Mask Transformer
Figure 3 for PolyMaX: General Dense Prediction with Mask Transformer
Figure 4 for PolyMaX: General Dense Prediction with Mask Transformer
Viaarxiv icon

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Add code
Oct 09, 2023
Viaarxiv icon

DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model

Add code
Jun 02, 2023
Figure 1 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 2 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 3 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 4 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Viaarxiv icon

A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models

Add code
Feb 13, 2023
Figure 1 for A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
Figure 2 for A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
Figure 3 for A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
Figure 4 for A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
Viaarxiv icon