Picture for Weijia Li

Weijia Li

IPCV: Information-Preserving Compression for MLLM Visual Encoders

Add code
Dec 21, 2025
Viaarxiv icon

OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild

Add code
Nov 11, 2025
Viaarxiv icon

OmniLayout: Enabling Coarse-to-Fine Learning with LLMs for Universal Document Layout Generation

Add code
Oct 30, 2025
Viaarxiv icon

AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs

Add code
Oct 08, 2025
Figure 1 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 2 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 3 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 4 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Viaarxiv icon

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Add code
Sep 26, 2025
Viaarxiv icon

UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective

Add code
Sep 26, 2025
Figure 1 for UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective
Figure 2 for UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective
Figure 3 for UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective
Figure 4 for UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective
Viaarxiv icon

Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation

Add code
Aug 13, 2025
Figure 1 for Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
Figure 2 for Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
Figure 3 for Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
Figure 4 for Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
Viaarxiv icon

OmniEarth-Bench: Towards Holistic Evaluation of Earth's Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data

Add code
May 29, 2025
Viaarxiv icon

Shifting AI Efficiency From Model-Centric to Data-Centric Compression

Add code
May 25, 2025
Viaarxiv icon

Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind

Add code
May 18, 2025
Viaarxiv icon