Picture for Jiansheng Chen

Jiansheng Chen

XYZCylinder: Feedforward Reconstruction for Driving Scenes Based on A Unified Cylinder Lifting Method

Add code
Oct 09, 2025
Viaarxiv icon

Step-Audio 2 Technical Report

Add code
Jul 24, 2025
Viaarxiv icon

From Black Boxes to Transparent Minds: Evaluating and Enhancing the Theory of Mind in Multimodal Large Language Models

Add code
Jun 17, 2025
Viaarxiv icon

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

Add code
Jun 10, 2025
Viaarxiv icon

DeCoDe: Defer-and-Complement Decision-Making via Decoupled Concept Bottleneck Models

Add code
May 25, 2025
Viaarxiv icon

QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models

Add code
Apr 15, 2025
Viaarxiv icon

CIBR: Cross-modal Information Bottleneck Regularization for Robust CLIP Generalization

Add code
Mar 31, 2025
Viaarxiv icon

LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text

Add code
Mar 25, 2025
Viaarxiv icon

Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model

Add code
Mar 14, 2025
Viaarxiv icon

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

Add code
Feb 18, 2025
Viaarxiv icon