Picture for Yuying Ge

Yuying Ge

SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing

Add code
May 07, 2024
Figure 1 for SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
Figure 2 for SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
Figure 3 for SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
Figure 4 for SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
Viaarxiv icon

SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension

Add code
Apr 25, 2024
Figure 1 for SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Figure 2 for SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Figure 3 for SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Figure 4 for SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Viaarxiv icon

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

Add code
Apr 22, 2024
Viaarxiv icon

Supervised Fine-tuning in turn Improves Visual Foundation Models

Add code
Jan 18, 2024
Figure 1 for Supervised Fine-tuning in turn Improves Visual Foundation Models
Figure 2 for Supervised Fine-tuning in turn Improves Visual Foundation Models
Figure 3 for Supervised Fine-tuning in turn Improves Visual Foundation Models
Figure 4 for Supervised Fine-tuning in turn Improves Visual Foundation Models
Viaarxiv icon

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

Add code
Dec 14, 2023
Figure 1 for VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
Figure 2 for VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
Figure 3 for VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
Figure 4 for VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
Viaarxiv icon

EgoPlan-Bench: Benchmarking Egocentric Embodied Planning with Multimodal Large Language Models

Add code
Dec 11, 2023
Figure 1 for EgoPlan-Bench: Benchmarking Egocentric Embodied Planning with Multimodal Large Language Models
Figure 2 for EgoPlan-Bench: Benchmarking Egocentric Embodied Planning with Multimodal Large Language Models
Figure 3 for EgoPlan-Bench: Benchmarking Egocentric Embodied Planning with Multimodal Large Language Models
Figure 4 for EgoPlan-Bench: Benchmarking Egocentric Embodied Planning with Multimodal Large Language Models
Viaarxiv icon

SEED-Bench-2: Benchmarking Multimodal Large Language Models

Add code
Nov 28, 2023
Figure 1 for SEED-Bench-2: Benchmarking Multimodal Large Language Models
Figure 2 for SEED-Bench-2: Benchmarking Multimodal Large Language Models
Figure 3 for SEED-Bench-2: Benchmarking Multimodal Large Language Models
Figure 4 for SEED-Bench-2: Benchmarking Multimodal Large Language Models
Viaarxiv icon

ViT-Lens-2: Gateway to Omni-modal Intelligence

Add code
Nov 27, 2023
Viaarxiv icon

Making LLaMA SEE and Draw with SEED Tokenizer

Add code
Oct 02, 2023
Viaarxiv icon

GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields

Add code
Sep 01, 2023
Figure 1 for GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields
Figure 2 for GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields
Figure 3 for GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields
Figure 4 for GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields
Viaarxiv icon