Picture for Hu Wei

Hu Wei

Logics-Parsing-Omni Technical Report

Add code
Mar 12, 2026
Viaarxiv icon

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

Add code
Mar 04, 2026
Viaarxiv icon

ClinConsensus: A Consensus-Based Benchmark for Evaluating Chinese Medical LLMs across Difficulty Levels

Add code
Mar 03, 2026
Viaarxiv icon

SPM-Bench: Benchmarking Large Language Models for Scanning Probe Microscopy

Add code
Feb 26, 2026
Viaarxiv icon

DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning

Add code
Feb 18, 2026
Viaarxiv icon

HLE-Verified: A Systematic Verification and Structured Revision of Humanity's Last Exam

Add code
Feb 17, 2026
Viaarxiv icon

Credit Where It is Due: Cross-Modality Connectivity Drives Precise Reinforcement Learning for MLLM Reasoning

Add code
Feb 12, 2026
Viaarxiv icon

Beyond Quantity: Trajectory Diversity Scaling for Code Agents

Add code
Feb 03, 2026
Viaarxiv icon

Agentic Proposing: Enhancing Large Language Model Reasoning via Compositional Skill Synthesis

Add code
Feb 03, 2026
Viaarxiv icon

Socratic-Geo: Synthetic Data Generation and Geometric Reasoning via Multi-Agent Interaction

Add code
Feb 03, 2026
Viaarxiv icon