Picture for Jianbing Shen

Jianbing Shen

Clinical Cognition Alignment for Gastrointestinal Diagnosis with Multimodal LLMs

Add code
Mar 21, 2026
Viaarxiv icon

Bridging Scene Generation and Planning: Driving with World Model via Unifying Vision and Motion Representation

Add code
Mar 16, 2026
Viaarxiv icon

HanMoVLM: Large Vision-Language Models for Professional Artistic Painting Evaluation

Add code
Mar 11, 2026
Viaarxiv icon

Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss

Add code
Feb 02, 2026
Viaarxiv icon

Towards Geometry-Aware and Motion-Guided Video Human Mesh Recovery

Add code
Jan 29, 2026
Viaarxiv icon

From Human Intention to Action Prediction: A Comprehensive Benchmark for Intention-driven End-to-End Autonomous Driving

Add code
Dec 13, 2025
Viaarxiv icon

TransBridge: Boost 3D Object Detection by Scene-Level Completion with Transformer Decoder

Add code
Dec 12, 2025
Figure 1 for TransBridge: Boost 3D Object Detection by Scene-Level Completion with Transformer Decoder
Figure 2 for TransBridge: Boost 3D Object Detection by Scene-Level Completion with Transformer Decoder
Figure 3 for TransBridge: Boost 3D Object Detection by Scene-Level Completion with Transformer Decoder
Figure 4 for TransBridge: Boost 3D Object Detection by Scene-Level Completion with Transformer Decoder
Viaarxiv icon

Sim4Seg: Boosting Multimodal Multi-disease Medical Diagnosis Segmentation with Region-Aware Vision-Language Similarity Masks

Add code
Nov 10, 2025
Viaarxiv icon

Semantic Causality-Aware Vision-Based 3D Occupancy Prediction

Add code
Sep 10, 2025
Figure 1 for Semantic Causality-Aware Vision-Based 3D Occupancy Prediction
Figure 2 for Semantic Causality-Aware Vision-Based 3D Occupancy Prediction
Figure 3 for Semantic Causality-Aware Vision-Based 3D Occupancy Prediction
Figure 4 for Semantic Causality-Aware Vision-Based 3D Occupancy Prediction
Viaarxiv icon

RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping

Add code
Jul 31, 2025
Figure 1 for RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
Figure 2 for RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
Figure 3 for RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
Figure 4 for RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
Viaarxiv icon