Picture for Yang Shi

Yang Shi

GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models

Add code
Dec 17, 2025
Viaarxiv icon

Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

Add code
Dec 14, 2025
Figure 1 for Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Figure 2 for Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Figure 3 for Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Figure 4 for Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Viaarxiv icon

The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss

Add code
Dec 09, 2025
Figure 1 for The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Figure 2 for The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Figure 3 for The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Figure 4 for The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Viaarxiv icon

PR-CapsNet: Pseudo-Riemannian Capsule Network with Adaptive Curvature Routing for Graph Learning

Add code
Dec 09, 2025
Viaarxiv icon

Hybrid Attribution Priors for Explainable and Robust Model Training

Add code
Dec 09, 2025
Viaarxiv icon

MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning

Add code
Oct 16, 2025
Figure 1 for MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning
Figure 2 for MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning
Figure 3 for MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning
Figure 4 for MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning
Viaarxiv icon

BaseReward: A Strong Baseline for Multimodal Reward Model

Add code
Sep 19, 2025
Figure 1 for BaseReward: A Strong Baseline for Multimodal Reward Model
Figure 2 for BaseReward: A Strong Baseline for Multimodal Reward Model
Figure 3 for BaseReward: A Strong Baseline for Multimodal Reward Model
Figure 4 for BaseReward: A Strong Baseline for Multimodal Reward Model
Viaarxiv icon

Collaborative-Online-Learning-Enabled Distributionally Robust Motion Control for Multi-Robot Systems

Add code
Aug 24, 2025
Viaarxiv icon

VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks

Add code
Jun 10, 2025
Figure 1 for VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks
Figure 2 for VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks
Figure 3 for VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks
Figure 4 for VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks
Viaarxiv icon

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios

Add code
May 27, 2025
Viaarxiv icon