Picture for Ming Li

Ming Li

School of Integrated Circuits, Peking University

Step-Audio 2 Technical Report

Add code
Jul 24, 2025
Figure 1 for Step-Audio 2 Technical Report
Figure 2 for Step-Audio 2 Technical Report
Figure 3 for Step-Audio 2 Technical Report
Figure 4 for Step-Audio 2 Technical Report
Viaarxiv icon

PyVision: Agentic Vision with Dynamic Tooling

Add code
Jul 10, 2025
Figure 1 for PyVision: Agentic Vision with Dynamic Tooling
Figure 2 for PyVision: Agentic Vision with Dynamic Tooling
Figure 3 for PyVision: Agentic Vision with Dynamic Tooling
Figure 4 for PyVision: Agentic Vision with Dynamic Tooling
Viaarxiv icon

Graph Learning for Cooperative Cell-Free ISAC Systems: From Optimization to Estimation

Add code
Jul 09, 2025
Viaarxiv icon

Robust Brain Tumor Segmentation with Incomplete MRI Modalities Using Hölder Divergence and Mutual Information-Enhanced Knowledge Transfer

Add code
Jul 02, 2025
Figure 1 for Robust Brain Tumor Segmentation with Incomplete MRI Modalities Using Hölder Divergence and Mutual Information-Enhanced Knowledge Transfer
Figure 2 for Robust Brain Tumor Segmentation with Incomplete MRI Modalities Using Hölder Divergence and Mutual Information-Enhanced Knowledge Transfer
Figure 3 for Robust Brain Tumor Segmentation with Incomplete MRI Modalities Using Hölder Divergence and Mutual Information-Enhanced Knowledge Transfer
Figure 4 for Robust Brain Tumor Segmentation with Incomplete MRI Modalities Using Hölder Divergence and Mutual Information-Enhanced Knowledge Transfer
Viaarxiv icon

Sekai: A Video Dataset towards World Exploration

Add code
Jun 18, 2025
Viaarxiv icon

ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs

Add code
Jun 11, 2025
Viaarxiv icon

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

Add code
Jun 10, 2025
Figure 1 for Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Figure 2 for Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Figure 3 for Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Figure 4 for Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Viaarxiv icon

What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding

Add code
Jun 08, 2025
Viaarxiv icon

Diarization-Aware Multi-Speaker Automatic Speech Recognition via Large Language Models

Add code
Jun 06, 2025
Viaarxiv icon

ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On

Add code
Jun 06, 2025
Viaarxiv icon