Picture for Xiuye Gu

Xiuye Gu

CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor

Add code
Dec 21, 2023
Figure 1 for CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Figure 2 for CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Figure 3 for CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Figure 4 for CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Viaarxiv icon

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Add code
Dec 21, 2023
Figure 1 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Figure 2 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Figure 3 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Figure 4 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Viaarxiv icon

Pixel Aligned Language Models

Add code
Dec 14, 2023
Figure 1 for Pixel Aligned Language Models
Figure 2 for Pixel Aligned Language Models
Figure 3 for Pixel Aligned Language Models
Figure 4 for Pixel Aligned Language Models
Viaarxiv icon

Photorealistic Video Generation with Diffusion Models

Add code
Dec 11, 2023
Figure 1 for Photorealistic Video Generation with Diffusion Models
Figure 2 for Photorealistic Video Generation with Diffusion Models
Figure 3 for Photorealistic Video Generation with Diffusion Models
Figure 4 for Photorealistic Video Generation with Diffusion Models
Viaarxiv icon

PolyMaX: General Dense Prediction with Mask Transformer

Add code
Nov 09, 2023
Figure 1 for PolyMaX: General Dense Prediction with Mask Transformer
Figure 2 for PolyMaX: General Dense Prediction with Mask Transformer
Figure 3 for PolyMaX: General Dense Prediction with Mask Transformer
Figure 4 for PolyMaX: General Dense Prediction with Mask Transformer
Viaarxiv icon

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Add code
Oct 09, 2023
Figure 1 for Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Figure 2 for Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Figure 3 for Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Figure 4 for Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Viaarxiv icon

DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model

Add code
Jun 02, 2023
Figure 1 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 2 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 3 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 4 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Viaarxiv icon

A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models

Add code
Feb 13, 2023
Figure 1 for A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
Figure 2 for A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
Figure 3 for A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
Figure 4 for A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
Viaarxiv icon

Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features

Add code
Dec 20, 2022
Figure 1 for Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features
Figure 2 for Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features
Figure 3 for Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features
Figure 4 for Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features
Viaarxiv icon

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

Add code
Sep 30, 2022
Figure 1 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 2 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 3 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 4 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Viaarxiv icon