Picture for Rao Muhammad Anwer

Rao Muhammad Anwer

TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation

Add code
Jun 06, 2025
Viaarxiv icon

Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks

Add code
May 30, 2025
Viaarxiv icon

Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs

Add code
May 26, 2025
Viaarxiv icon

OpenSeg-R: Improving Open-Vocabulary Segmentation via Step-by-Step Visual Reasoning

Add code
May 22, 2025
Viaarxiv icon

ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark

Add code
May 22, 2025
Viaarxiv icon

Adapting In-Domain Few-Shot Segmentation to New Domains without Retraining

Add code
Apr 30, 2025
Viaarxiv icon

Tracking Meets Large Multimodal Models for Driving Scenario Understanding

Add code
Mar 18, 2025
Viaarxiv icon

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding

Add code
Mar 13, 2025
Viaarxiv icon

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

Add code
Mar 06, 2025
Viaarxiv icon

LLM Post-Training: A Deep Dive into Reasoning Large Language Models

Add code
Feb 28, 2025
Viaarxiv icon