Picture for Jiansheng Chen

Jiansheng Chen

Knowledge-Guided Adversarial Training for Infrared Object Detection via Thermal Radiation Modeling

Add code
Mar 26, 2026
Viaarxiv icon

Video-Only ToM: Enhancing Theory of Mind in Multimodal Large Language Models

Add code
Mar 25, 2026
Viaarxiv icon

Step-DeepResearch Technical Report

Add code
Dec 24, 2025
Viaarxiv icon

XYZCylinder: Feedforward Reconstruction for Driving Scenes Based on A Unified Cylinder Lifting Method

Add code
Oct 09, 2025
Viaarxiv icon

Step-Audio 2 Technical Report

Add code
Jul 24, 2025
Figure 1 for Step-Audio 2 Technical Report
Figure 2 for Step-Audio 2 Technical Report
Figure 3 for Step-Audio 2 Technical Report
Figure 4 for Step-Audio 2 Technical Report
Viaarxiv icon

From Black Boxes to Transparent Minds: Evaluating and Enhancing the Theory of Mind in Multimodal Large Language Models

Add code
Jun 17, 2025
Viaarxiv icon

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

Add code
Jun 10, 2025
Figure 1 for Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Figure 2 for Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Figure 3 for Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Figure 4 for Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Viaarxiv icon

DeCoDe: Defer-and-Complement Decision-Making via Decoupled Concept Bottleneck Models

Add code
May 25, 2025
Figure 1 for DeCoDe: Defer-and-Complement Decision-Making via Decoupled Concept Bottleneck Models
Figure 2 for DeCoDe: Defer-and-Complement Decision-Making via Decoupled Concept Bottleneck Models
Figure 3 for DeCoDe: Defer-and-Complement Decision-Making via Decoupled Concept Bottleneck Models
Figure 4 for DeCoDe: Defer-and-Complement Decision-Making via Decoupled Concept Bottleneck Models
Viaarxiv icon

QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models

Add code
Apr 15, 2025
Figure 1 for QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models
Figure 2 for QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models
Figure 3 for QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models
Figure 4 for QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models
Viaarxiv icon

CIBR: Cross-modal Information Bottleneck Regularization for Robust CLIP Generalization

Add code
Mar 31, 2025
Viaarxiv icon