Picture for Cunxiang Wang

Cunxiang Wang

Beyond Literal Mapping: Benchmarking and Improving Non-Literal Translation Evaluation

Add code
Jan 12, 2026
Viaarxiv icon

DVD: A Robust Method for Detecting Variant Contamination in Large Language Model Evaluation

Add code
Jan 08, 2026
Viaarxiv icon

AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs

Add code
Oct 08, 2025
Figure 1 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 2 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 3 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 4 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Viaarxiv icon

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Add code
Aug 08, 2025
Viaarxiv icon

Unlocking Recursive Thinking of LLMs: Alignment via Refinement

Add code
Jun 06, 2025
Viaarxiv icon

Exploring the Evolution of Physics Cognition in Video Generation: A Survey

Add code
Mar 27, 2025
Viaarxiv icon

StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error

Add code
Mar 13, 2025
Viaarxiv icon

LongSafety: Evaluating Long-Context Safety of Large Language Models

Add code
Feb 24, 2025
Figure 1 for LongSafety: Evaluating Long-Context Safety of Large Language Models
Figure 2 for LongSafety: Evaluating Long-Context Safety of Large Language Models
Figure 3 for LongSafety: Evaluating Long-Context Safety of Large Language Models
Figure 4 for LongSafety: Evaluating Long-Context Safety of Large Language Models
Viaarxiv icon

HPSS: Heuristic Prompting Strategy Search for LLM Evaluators

Add code
Feb 18, 2025
Figure 1 for HPSS: Heuristic Prompting Strategy Search for LLM Evaluators
Figure 2 for HPSS: Heuristic Prompting Strategy Search for LLM Evaluators
Figure 3 for HPSS: Heuristic Prompting Strategy Search for LLM Evaluators
Figure 4 for HPSS: Heuristic Prompting Strategy Search for LLM Evaluators
Viaarxiv icon

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Add code
Dec 16, 2024
Figure 1 for SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Figure 2 for SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Figure 3 for SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Figure 4 for SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Viaarxiv icon