Picture for Ming Li

Ming Li

School of Integrated Circuits, Peking University

Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement

Add code
Mar 04, 2026
Viaarxiv icon

D-GVIO: A Buffer-Driven and Efficient Decentralized GNSS-Visual-Inertial State Estimator for Multi-Agent Systems

Add code
Mar 03, 2026
Viaarxiv icon

Memory-guided Prototypical Co-occurrence Learning for Mixed Emotion Recognition

Add code
Feb 24, 2026
Viaarxiv icon

PyVision-RL: Forging Open Agentic Vision Models via RL

Add code
Feb 24, 2026
Viaarxiv icon

DataCube: A Video Retrieval Platform via Natural Language Semantic Profiling

Add code
Feb 18, 2026
Viaarxiv icon

Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook

Add code
Feb 15, 2026
Viaarxiv icon

What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis

Add code
Feb 12, 2026
Viaarxiv icon

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Add code
Feb 11, 2026
Viaarxiv icon

Clutter-Aware Integrated Sensing and Communication: Models, Methods, and Future Directions

Add code
Feb 11, 2026
Viaarxiv icon

PhyCritic: Multimodal Critic Models for Physical AI

Add code
Feb 11, 2026
Viaarxiv icon