Picture for Sijie Zhao

Sijie Zhao

CV-VAE: A Compatible Video VAE for Latent Generative Video Models

Add code
May 30, 2024
Viaarxiv icon

SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing

Add code
May 07, 2024
Figure 1 for SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
Figure 2 for SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
Figure 3 for SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
Figure 4 for SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
Viaarxiv icon

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

Add code
Apr 22, 2024
Viaarxiv icon

RS-Mamba for Large Remote Sensing Image Dense Prediction

Add code
Apr 10, 2024
Figure 1 for RS-Mamba for Large Remote Sensing Image Dense Prediction
Figure 2 for RS-Mamba for Large Remote Sensing Image Dense Prediction
Figure 3 for RS-Mamba for Large Remote Sensing Image Dense Prediction
Figure 4 for RS-Mamba for Large Remote Sensing Image Dense Prediction
Viaarxiv icon

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

Add code
Dec 14, 2023
Figure 1 for VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
Figure 2 for VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
Figure 3 for VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
Figure 4 for VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
Viaarxiv icon

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

Add code
Nov 27, 2023
Figure 1 for UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Figure 2 for UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Figure 3 for UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Figure 4 for UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Viaarxiv icon

Exchanging Dual Encoder-Decoder: A New Strategy for Change Detection with Semantic Guidance and Spatial Localization

Add code
Nov 19, 2023
Viaarxiv icon

Making LLaMA SEE and Draw with SEED Tokenizer

Add code
Oct 02, 2023
Figure 1 for Making LLaMA SEE and Draw with SEED Tokenizer
Figure 2 for Making LLaMA SEE and Draw with SEED Tokenizer
Figure 3 for Making LLaMA SEE and Draw with SEED Tokenizer
Figure 4 for Making LLaMA SEE and Draw with SEED Tokenizer
Viaarxiv icon

Sticker820K: Empowering Interactive Retrieval with Stickers

Add code
Jun 12, 2023
Figure 1 for Sticker820K: Empowering Interactive Retrieval with Stickers
Figure 2 for Sticker820K: Empowering Interactive Retrieval with Stickers
Figure 3 for Sticker820K: Empowering Interactive Retrieval with Stickers
Figure 4 for Sticker820K: Empowering Interactive Retrieval with Stickers
Viaarxiv icon

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction

Add code
May 30, 2023
Figure 1 for GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
Figure 2 for GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
Figure 3 for GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
Figure 4 for GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
Viaarxiv icon