Picture for Boqiang Zhang

Boqiang Zhang

Boosting Segment Anything Model to Generalize Visually Non-Salient Scenarios

Add code
Jan 02, 2026
Viaarxiv icon

N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models

Add code
Dec 18, 2025
Viaarxiv icon

What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs

Add code
Feb 19, 2025
Figure 1 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Figure 2 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Figure 3 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Figure 4 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Viaarxiv icon

ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark

Add code
Jan 09, 2025
Figure 1 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Figure 2 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Figure 3 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Figure 4 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Viaarxiv icon

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Add code
Jan 08, 2025
Viaarxiv icon

Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents

Add code
Oct 25, 2024
Figure 1 for Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents
Figure 2 for Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents
Figure 3 for Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents
Figure 4 for Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents
Viaarxiv icon

Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents

Add code
Oct 17, 2024
Figure 1 for Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents
Figure 2 for Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents
Figure 3 for Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents
Figure 4 for Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents
Viaarxiv icon

How Control Information Influences Multilingual Text Image Generation and Editing?

Add code
Jul 16, 2024
Viaarxiv icon

Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition

Add code
Jul 08, 2024
Viaarxiv icon

Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition

Add code
May 11, 2024
Figure 1 for Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
Figure 2 for Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
Figure 3 for Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
Figure 4 for Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
Viaarxiv icon