Picture for Hao Tian

Hao Tian

Sichuan University

MiMo-Audio: Audio Language Models are Few-Shot Learners

Add code
Dec 29, 2025
Viaarxiv icon

GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation

Add code
Dec 19, 2025
Viaarxiv icon

Interaction-Aware 4D Gaussian Splatting for Dynamic Hand-Object Interaction Reconstruction

Add code
Nov 18, 2025
Viaarxiv icon

Tibetan Language and AI: A Comprehensive Survey of Resources, Methods and Challenges

Add code
Oct 22, 2025
Viaarxiv icon

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Add code
May 12, 2025
Viaarxiv icon

EfficientHuman: Efficient Training and Reconstruction of Moving Human using Articulated 2D Gaussian

Add code
Apr 29, 2025
Viaarxiv icon

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Add code
Apr 15, 2025
Viaarxiv icon

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Add code
Mar 13, 2025
Viaarxiv icon

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

Add code
Dec 12, 2024
Figure 1 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Figure 2 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Figure 3 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Figure 4 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Figure 1 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 2 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 3 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 4 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Viaarxiv icon