Picture for Shilin Yan

Shilin Yan

MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning

Add code
Jun 05, 2025
Viaarxiv icon

Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking

Add code
May 26, 2025
Viaarxiv icon

Progressive Scaling Visual Object Tracking

Add code
May 26, 2025
Viaarxiv icon

CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms

Add code
May 22, 2025
Viaarxiv icon

UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens

Add code
May 20, 2025
Viaarxiv icon

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Add code
May 01, 2025
Viaarxiv icon

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Add code
Mar 13, 2025
Viaarxiv icon

WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs

Add code
Feb 06, 2025
Figure 1 for WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
Figure 2 for WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
Figure 3 for WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
Figure 4 for WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
Viaarxiv icon

General Compression Framework for Efficient Transformer Object Tracking

Add code
Sep 26, 2024
Figure 1 for General Compression Framework for Efficient Transformer Object Tracking
Figure 2 for General Compression Framework for Efficient Transformer Object Tracking
Figure 3 for General Compression Framework for Efficient Transformer Object Tracking
Figure 4 for General Compression Framework for Efficient Transformer Object Tracking
Viaarxiv icon

VISA: Reasoning Video Object Segmentation via Large Language Models

Add code
Jul 16, 2024
Figure 1 for VISA: Reasoning Video Object Segmentation via Large Language Models
Figure 2 for VISA: Reasoning Video Object Segmentation via Large Language Models
Figure 3 for VISA: Reasoning Video Object Segmentation via Large Language Models
Figure 4 for VISA: Reasoning Video Object Segmentation via Large Language Models
Viaarxiv icon