Picture for Shaoxiang Chen

Shaoxiang Chen

OmniGenBench: A Benchmark for Omnipotent Multimodal Generation across 50+ Tasks

Add code
May 24, 2025
Viaarxiv icon

One RL to See Them All: Visual Triple Unified Reinforcement Learning

Add code
May 23, 2025
Viaarxiv icon

UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding

Add code
Apr 06, 2025
Viaarxiv icon

ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model

Add code
Nov 04, 2024
Figure 1 for ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model
Figure 2 for ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model
Figure 3 for ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model
Figure 4 for ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model
Viaarxiv icon

EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models

Add code
Sep 26, 2024
Figure 1 for EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models
Figure 2 for EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models
Figure 3 for EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models
Figure 4 for EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models
Viaarxiv icon

EventHallusion: Diagnosing Event Hallucinations in Video LLMs

Add code
Sep 25, 2024
Figure 1 for EventHallusion: Diagnosing Event Hallucinations in Video LLMs
Figure 2 for EventHallusion: Diagnosing Event Hallucinations in Video LLMs
Figure 3 for EventHallusion: Diagnosing Event Hallucinations in Video LLMs
Figure 4 for EventHallusion: Diagnosing Event Hallucinations in Video LLMs
Viaarxiv icon

Making Large Language Models Better Planners with Reasoning-Decision Alignment

Add code
Aug 25, 2024
Viaarxiv icon

MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis

Add code
Jul 03, 2024
Viaarxiv icon

Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models

Add code
Jun 12, 2024
Figure 1 for Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models
Figure 2 for Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models
Figure 3 for Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models
Figure 4 for Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models
Viaarxiv icon

Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models

Add code
Mar 12, 2024
Viaarxiv icon