Picture for Jusheng Zhang

Jusheng Zhang

FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models

Add code
Dec 23, 2025
Viaarxiv icon

MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models

Add code
Dec 09, 2025
Viaarxiv icon

HybridToken-VLM: Hybrid Token Compression for Vision-Language Models

Add code
Dec 09, 2025
Viaarxiv icon

3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale

Add code
Nov 17, 2025
Viaarxiv icon

Cost-Effective Communication: An Auction-based Method for Language Agent Interaction

Add code
Nov 17, 2025
Viaarxiv icon

Understanding Hardness of Vision-Language Compositionality from A Token-level Causal Lens

Add code
Oct 30, 2025
Viaarxiv icon

RaCoT: Plug-and-Play Contrastive Example Generation Mechanism for Enhanced LLM Reasoning Reliability

Add code
Oct 26, 2025
Viaarxiv icon

Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization

Add code
Oct 26, 2025
Viaarxiv icon

Guardian: Decoupling Exploration from Safety in Reinforcement Learning

Add code
Oct 26, 2025
Viaarxiv icon

Top-Down Semantic Refinement for Image Captioning

Add code
Oct 25, 2025
Viaarxiv icon