Picture for Yanwei Li

Yanwei Li

Visual Spatial Tuning

Add code
Nov 07, 2025
Viaarxiv icon

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Add code
Oct 22, 2025
Viaarxiv icon

Aligning Effective Tokens with Video Anomaly in Large Language Models

Add code
Aug 08, 2025
Viaarxiv icon

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

Add code
May 30, 2025
Figure 1 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Figure 2 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Figure 3 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Figure 4 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Viaarxiv icon

FoMoH: A clinically meaningful foundation model evaluation for structured electronic health records

Add code
May 22, 2025
Viaarxiv icon

Seed1.5-VL Technical Report

Add code
May 11, 2025
Viaarxiv icon

Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding

Add code
Apr 14, 2025
Viaarxiv icon

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Add code
Feb 13, 2025
Figure 1 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Figure 2 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Figure 3 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Figure 4 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Viaarxiv icon

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Add code
Dec 12, 2024
Figure 1 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Figure 2 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Figure 3 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Figure 4 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Viaarxiv icon

Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration

Add code
Dec 01, 2024
Figure 1 for Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration
Figure 2 for Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration
Figure 3 for Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration
Figure 4 for Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration
Viaarxiv icon