Picture for Boqiang Zhang

Boqiang Zhang

Boosting Segment Anything Model to Generalize Visually Non-Salient Scenarios

Add code
Jan 02, 2026
Viaarxiv icon

N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models

Add code
Dec 18, 2025
Viaarxiv icon

What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs

Add code
Feb 19, 2025
Figure 1 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Figure 2 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Figure 3 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Figure 4 for What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs
Viaarxiv icon

ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark

Add code
Jan 09, 2025
Figure 1 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Figure 2 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Figure 3 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Figure 4 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Viaarxiv icon

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Add code
Jan 08, 2025
Viaarxiv icon

Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents

Add code
Oct 25, 2024
Figure 1 for Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents
Figure 2 for Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents
Figure 3 for Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents
Figure 4 for Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents
Viaarxiv icon

Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents

Add code
Oct 17, 2024
Figure 1 for Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents
Figure 2 for Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents
Figure 3 for Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents
Figure 4 for Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents
Viaarxiv icon

How Control Information Influences Multilingual Text Image Generation and Editing?

Add code
Jul 16, 2024
Figure 1 for How Control Information Influences Multilingual Text Image Generation and Editing?
Figure 2 for How Control Information Influences Multilingual Text Image Generation and Editing?
Figure 3 for How Control Information Influences Multilingual Text Image Generation and Editing?
Figure 4 for How Control Information Influences Multilingual Text Image Generation and Editing?
Viaarxiv icon

Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition

Add code
Jul 08, 2024
Viaarxiv icon

Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition

Add code
May 11, 2024
Figure 1 for Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
Figure 2 for Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
Figure 3 for Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
Figure 4 for Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
Viaarxiv icon