Picture for Zehuan Yuan

Zehuan Yuan

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation

Add code
Jun 13, 2024
Figure 1 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Figure 2 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Figure 3 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Figure 4 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Viaarxiv icon

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Add code
Jun 10, 2024
Figure 1 for Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Figure 2 for Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Figure 3 for Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Figure 4 for Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Viaarxiv icon

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Add code
Apr 19, 2024
Figure 1 for Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Figure 2 for Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Figure 3 for Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Figure 4 for Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Viaarxiv icon

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Add code
Apr 03, 2024
Viaarxiv icon

Generative Region-Language Pretraining for Open-Ended Object Detection

Add code
Mar 15, 2024
Figure 1 for Generative Region-Language Pretraining for Open-Ended Object Detection
Figure 2 for Generative Region-Language Pretraining for Open-Ended Object Detection
Figure 3 for Generative Region-Language Pretraining for Open-Ended Object Detection
Figure 4 for Generative Region-Language Pretraining for Open-Ended Object Detection
Viaarxiv icon

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Add code
Dec 25, 2023
Figure 1 for UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Figure 2 for UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Figure 3 for UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Figure 4 for UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Viaarxiv icon

General Object Foundation Model for Images and Videos at Scale

Add code
Dec 14, 2023
Figure 1 for General Object Foundation Model for Images and Videos at Scale
Figure 2 for General Object Foundation Model for Images and Videos at Scale
Figure 3 for General Object Foundation Model for Images and Videos at Scale
Figure 4 for General Object Foundation Model for Images and Videos at Scale
Viaarxiv icon

Recognize Any Regions

Add code
Nov 02, 2023
Figure 1 for Recognize Any Regions
Figure 2 for Recognize Any Regions
Figure 3 for Recognize Any Regions
Figure 4 for Recognize Any Regions
Viaarxiv icon

CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection

Add code
Oct 25, 2023
Figure 1 for CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
Figure 2 for CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
Figure 3 for CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
Figure 4 for CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
Viaarxiv icon

EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE

Add code
Aug 23, 2023
Figure 1 for EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Figure 2 for EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Figure 3 for EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Figure 4 for EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Viaarxiv icon