Picture for Yixiao Ge

Yixiao Ge

SEED-Story: Multimodal Long Story Generation with Large Language Model

Add code
Jul 11, 2024
Viaarxiv icon

VoCo-LLaMA: Towards Vision Compression with Large Language Models

Add code
Jun 18, 2024
Figure 1 for VoCo-LLaMA: Towards Vision Compression with Large Language Models
Figure 2 for VoCo-LLaMA: Towards Vision Compression with Large Language Models
Figure 3 for VoCo-LLaMA: Towards Vision Compression with Large Language Models
Figure 4 for VoCo-LLaMA: Towards Vision Compression with Large Language Models
Viaarxiv icon

GrootVL: Tree Topology is All You Need in State Space Model

Add code
Jun 04, 2024
Viaarxiv icon

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

Add code
May 13, 2024
Figure 1 for Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Figure 2 for Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Figure 3 for Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Figure 4 for Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Viaarxiv icon

SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing

Add code
May 07, 2024
Figure 1 for SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
Figure 2 for SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
Figure 3 for SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
Figure 4 for SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
Viaarxiv icon

SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension

Add code
Apr 25, 2024
Figure 1 for SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Figure 2 for SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Figure 3 for SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Figure 4 for SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Viaarxiv icon

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

Add code
Apr 22, 2024
Viaarxiv icon

ST-LLM: Large Language Models Are Effective Temporal Learners

Add code
Mar 30, 2024
Viaarxiv icon

YOLO-World: Real-Time Open-Vocabulary Object Detection

Add code
Feb 02, 2024
Figure 1 for YOLO-World: Real-Time Open-Vocabulary Object Detection
Figure 2 for YOLO-World: Real-Time Open-Vocabulary Object Detection
Figure 3 for YOLO-World: Real-Time Open-Vocabulary Object Detection
Figure 4 for YOLO-World: Real-Time Open-Vocabulary Object Detection
Viaarxiv icon

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Add code
Jan 25, 2024
Figure 1 for Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Figure 2 for Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Figure 3 for Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Figure 4 for Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Viaarxiv icon