Picture for Can Ma

Can Ma

MMTIT-Bench: A Multilingual and Multi-Scenario Benchmark with Cognition-Perception-Reasoning Guided Text-Image Machine Translation

Add code
Mar 25, 2026
Viaarxiv icon

Towards Real-World Document Parsing via Realistic Scene Synthesis and Document-Aware Training

Add code
Mar 25, 2026
Viaarxiv icon

IMTBench: A Multi-Scenario Cross-Modal Collaborative Evaluation Benchmark for In-Image Machine Translation

Add code
Mar 11, 2026
Viaarxiv icon

Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning

Add code
Feb 12, 2026
Viaarxiv icon

EXaMCaP: Subset Selection with Entropy Gain Maximization for Probing Capability Gains of Large Chart Understanding Training Sets

Add code
Feb 04, 2026
Viaarxiv icon

EmoCaliber: Advancing Reliable Visual Emotion Comprehension via Confidence Verbalization and Calibration

Add code
Dec 17, 2025
Viaarxiv icon

Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach

Add code
Sep 26, 2025
Viaarxiv icon

Gather and Trace: Rethinking Video TextVQA from an Instance-oriented Perspective

Add code
Aug 06, 2025
Viaarxiv icon

An Empirical Study on Configuring In-Context Learning Demonstrations for Unleashing MLLMs' Sentimental Perception Capability

Add code
May 22, 2025
Viaarxiv icon

Multi-Modal Molecular Representation Learning via Structure Awareness

Add code
May 09, 2025
Viaarxiv icon