Picture for Songtao Jiang

Songtao Jiang

Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos

Add code
Mar 18, 2026
Viaarxiv icon

CodePercept: Code-Grounded Visual STEM Perception for MLLMs

Add code
Mar 11, 2026
Viaarxiv icon

From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning

Add code
Mar 04, 2026
Viaarxiv icon

Knowing or Guessing? Robust Medical Visual Question Answering via Joint Consistency and Contrastive Learning

Add code
Aug 26, 2025
Viaarxiv icon

CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making

Add code
Jun 15, 2025
Figure 1 for CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making
Figure 2 for CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making
Figure 3 for CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making
Figure 4 for CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making
Viaarxiv icon

OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding

Add code
Apr 20, 2025
Figure 1 for OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding
Figure 2 for OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding
Figure 3 for OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding
Figure 4 for OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding
Viaarxiv icon

Modality-Fair Preference Optimization for Trustworthy MLLM Alignment

Add code
Oct 20, 2024
Figure 1 for Modality-Fair Preference Optimization for Trustworthy MLLM Alignment
Figure 2 for Modality-Fair Preference Optimization for Trustworthy MLLM Alignment
Figure 3 for Modality-Fair Preference Optimization for Trustworthy MLLM Alignment
Figure 4 for Modality-Fair Preference Optimization for Trustworthy MLLM Alignment
Viaarxiv icon

MoE-TinyMed: Mixture of Experts for Tiny Medical Large Vision-Language Models

Add code
Apr 16, 2024
Figure 1 for MoE-TinyMed: Mixture of Experts for Tiny Medical Large Vision-Language Models
Figure 2 for MoE-TinyMed: Mixture of Experts for Tiny Medical Large Vision-Language Models
Figure 3 for MoE-TinyMed: Mixture of Experts for Tiny Medical Large Vision-Language Models
Figure 4 for MoE-TinyMed: Mixture of Experts for Tiny Medical Large Vision-Language Models
Viaarxiv icon

Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models

Add code
Apr 06, 2024
Figure 1 for Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models
Figure 2 for Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models
Figure 3 for Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models
Figure 4 for Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models
Viaarxiv icon