Picture for Jusheng Zhang

Jusheng Zhang

Process-of-Thought Reasoning for Videos

Add code
Feb 07, 2026
Viaarxiv icon

Spectral Gating Networks

Add code
Feb 07, 2026
Viaarxiv icon

Rational ANOVA Networks

Add code
Feb 03, 2026
Viaarxiv icon

Why Keep Your Doubts to Yourself? Trading Visual Uncertainties in Multi-Agent Bandit Systems

Add code
Jan 26, 2026
Viaarxiv icon

ResAgent: Entropy-based Prior Point Discovery and Visual Reasoning for Referring Expression Segmentation

Add code
Jan 23, 2026
Viaarxiv icon

3D-Agent:Tri-Modal Multi-Agent Collaboration for Scalable 3D Object Annotation

Add code
Jan 07, 2026
Viaarxiv icon

FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models

Add code
Dec 23, 2025
Figure 1 for FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
Figure 2 for FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
Figure 3 for FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
Figure 4 for FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
Viaarxiv icon

MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models

Add code
Dec 09, 2025
Figure 1 for MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
Figure 2 for MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
Figure 3 for MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
Figure 4 for MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
Viaarxiv icon

HybridToken-VLM: Hybrid Token Compression for Vision-Language Models

Add code
Dec 09, 2025
Viaarxiv icon

3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale

Add code
Nov 17, 2025
Viaarxiv icon