Picture for Zehuan Yuan

Zehuan Yuan

Liquid: Language Models are Scalable Multi-modal Generators

Add code
Dec 05, 2024
Figure 1 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 2 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 3 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 4 for Liquid: Language Models are Scalable Multi-modal Generators
Viaarxiv icon

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

Add code
Dec 04, 2024
Figure 1 for TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
Figure 2 for TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
Figure 3 for TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
Figure 4 for TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
Viaarxiv icon

ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval

Add code
Oct 24, 2024
Figure 1 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Figure 2 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Figure 3 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Figure 4 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Viaarxiv icon

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation

Add code
Jun 13, 2024
Figure 1 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Figure 2 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Figure 3 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Figure 4 for OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Viaarxiv icon

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Add code
Jun 10, 2024
Figure 1 for Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Figure 2 for Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Figure 3 for Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Figure 4 for Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Viaarxiv icon

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Add code
Apr 19, 2024
Viaarxiv icon

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Add code
Apr 03, 2024
Figure 1 for Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Figure 2 for Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Figure 3 for Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Figure 4 for Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Viaarxiv icon

Generative Region-Language Pretraining for Open-Ended Object Detection

Add code
Mar 15, 2024
Viaarxiv icon

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Add code
Dec 25, 2023
Viaarxiv icon

General Object Foundation Model for Images and Videos at Scale

Add code
Dec 14, 2023
Figure 1 for General Object Foundation Model for Images and Videos at Scale
Figure 2 for General Object Foundation Model for Images and Videos at Scale
Figure 3 for General Object Foundation Model for Images and Videos at Scale
Figure 4 for General Object Foundation Model for Images and Videos at Scale
Viaarxiv icon