Picture for Chenguang Wang

Chenguang Wang

Michael Pokorny

VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding

Add code
Aug 10, 2025
Viaarxiv icon

EVA-MILP: Towards Standardized Evaluation of MILP Instance Generation

Add code
May 30, 2025
Viaarxiv icon

Importance Weighted Score Matching for Diffusion Samplers with Enhanced Mode Coverage

Add code
May 26, 2025
Viaarxiv icon

Where You Go is Who You Are: Behavioral Theory-Guided LLMs for Inverse Reinforcement Learning

Add code
May 22, 2025
Viaarxiv icon

AgentXploit: End-to-End Redteaming of Black-Box AI Agents

Add code
May 09, 2025
Viaarxiv icon

From Perceptions to Decisions: Wildfire Evacuation Decision Prediction with Behavioral Theory-informed LLMs

Add code
Feb 24, 2025
Viaarxiv icon

Humanity's Last Exam

Add code
Jan 24, 2025
Viaarxiv icon

MLAN: Language-Based Instruction Tuning Improves Zero-Shot Generalization of Multimodal Large Language Models

Add code
Nov 19, 2024
Viaarxiv icon

Rethinking the "Heatmap + Monte Carlo Tree Search" Paradigm for Solving Large Scale TSP

Add code
Nov 14, 2024
Viaarxiv icon

JudgeBench: A Benchmark for Evaluating LLM-based Judges

Add code
Oct 16, 2024
Figure 1 for JudgeBench: A Benchmark for Evaluating LLM-based Judges
Figure 2 for JudgeBench: A Benchmark for Evaluating LLM-based Judges
Figure 3 for JudgeBench: A Benchmark for Evaluating LLM-based Judges
Figure 4 for JudgeBench: A Benchmark for Evaluating LLM-based Judges
Viaarxiv icon