Picture for Md Rizwan Parvez

Md Rizwan Parvez

Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning

Add code
Apr 19, 2026
Viaarxiv icon

Omni-Modal Dissonance Benchmark: Systematically Breaking Modality Consensus to Probe Robustness and Calibrated Abstention

Add code
Mar 28, 2026
Viaarxiv icon

SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild?

Add code
Feb 03, 2026
Viaarxiv icon

WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment

Add code
Dec 14, 2025
Viaarxiv icon

DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive Dashboards

Add code
Aug 24, 2025
Viaarxiv icon

Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team

Add code
Jun 17, 2025
Viaarxiv icon

X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents

Add code
Apr 15, 2025
Viaarxiv icon

ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering

Add code
Apr 10, 2025
Viaarxiv icon

CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging

Add code
Feb 08, 2025
Viaarxiv icon

MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

Add code
Dec 31, 2024
Figure 1 for MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models
Figure 2 for MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models
Figure 3 for MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models
Figure 4 for MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models
Viaarxiv icon