Picture for Han Hu

Han Hu

University of Toronto

Towards Real-World Document Parsing via Realistic Scene Synthesis and Document-Aware Training

Add code
Mar 25, 2026
Viaarxiv icon

MMTIT-Bench: A Multilingual and Multi-Scenario Benchmark with Cognition-Perception-Reasoning Guided Text-Image Machine Translation

Add code
Mar 25, 2026
Viaarxiv icon

CtrlAttack: A Unified Attack on World-Model Control in Diffusion Models

Add code
Mar 13, 2026
Viaarxiv icon

LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing

Add code
Mar 13, 2026
Viaarxiv icon

Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training

Add code
Mar 12, 2026
Viaarxiv icon

Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models

Add code
Feb 17, 2026
Viaarxiv icon

FAIL: Flow Matching Adversarial Imitation Learning for Image Generation

Add code
Feb 12, 2026
Viaarxiv icon

GenArena: How Can We Achieve Human-Aligned Evaluation for Visual Generation Tasks?

Add code
Feb 05, 2026
Viaarxiv icon

Seeing Is Believing? A Benchmark for Multimodal Large Language Models on Visual Illusions and Anomalies

Add code
Feb 02, 2026
Viaarxiv icon

Streaming-dLLM: Accelerating Diffusion LLMs via Suffix Pruning and Dynamic Decoding

Add code
Jan 27, 2026
Viaarxiv icon