Picture for Chenhui Gou

Chenhui Gou

LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation

Add code
Oct 27, 2025
Viaarxiv icon

Emerging Properties in Unified Multimodal Pretraining

Add code
May 20, 2025
Figure 1 for Emerging Properties in Unified Multimodal Pretraining
Figure 2 for Emerging Properties in Unified Multimodal Pretraining
Figure 3 for Emerging Properties in Unified Multimodal Pretraining
Figure 4 for Emerging Properties in Unified Multimodal Pretraining
Viaarxiv icon

Seed1.5-VL Technical Report

Add code
May 11, 2025
Viaarxiv icon

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model

Add code
Mar 27, 2025
Viaarxiv icon

Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis

Add code
Mar 15, 2025
Viaarxiv icon

Evaluating and Advancing Multimodal Large Language Models in Ability Lens

Add code
Nov 22, 2024
Viaarxiv icon

EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance

Add code
Sep 12, 2024
Figure 1 for EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance
Figure 2 for EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance
Figure 3 for EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance
Figure 4 for EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance
Viaarxiv icon

How Well Can Vision Language Models See Image Details?

Add code
Aug 07, 2024
Figure 1 for How Well Can Vision Language Models See Image Details?
Figure 2 for How Well Can Vision Language Models See Image Details?
Figure 3 for How Well Can Vision Language Models See Image Details?
Figure 4 for How Well Can Vision Language Models See Image Details?
Viaarxiv icon

InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding

Add code
Jun 28, 2024
Viaarxiv icon

DrVideo: Document Retrieval Based Long Video Understanding

Add code
Jun 18, 2024
Figure 1 for DrVideo: Document Retrieval Based Long Video Understanding
Figure 2 for DrVideo: Document Retrieval Based Long Video Understanding
Figure 3 for DrVideo: Document Retrieval Based Long Video Understanding
Figure 4 for DrVideo: Document Retrieval Based Long Video Understanding
Viaarxiv icon