Picture for Kexin Tan

Kexin Tan

LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening

Add code
May 19, 2026
Viaarxiv icon

Can Deep Research Agents Find and Organize? Evaluating the Synthesis Gap with Expert Taxonomies

Add code
Jan 18, 2026
Viaarxiv icon

OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment

Add code
Jan 04, 2026
Viaarxiv icon