Picture for Weida Wang

Weida Wang

PolyReal: A Benchmark for Real-World Polymer Science Workflows

Add code
Apr 03, 2026
Viaarxiv icon

InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery

Add code
Feb 09, 2026
Viaarxiv icon

PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks

Add code
Feb 06, 2026
Viaarxiv icon

SecureSplit: Mitigating Backdoor Attacks in Split Learning

Add code
Jan 20, 2026
Viaarxiv icon

SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence

Add code
Dec 30, 2025
Viaarxiv icon

Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning

Add code
Oct 02, 2025
Viaarxiv icon

ChemBOMAS: Accelerated BO in Chemistry with LLM-Enhanced Multi-Agent System

Add code
Sep 10, 2025
Viaarxiv icon

CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics

Add code
Aug 25, 2025
Figure 1 for CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
Figure 2 for CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
Figure 3 for CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
Figure 4 for CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
Viaarxiv icon

Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery

Add code
Aug 11, 2025
Viaarxiv icon

S2R-Bench: A Sim-to-Real Evaluation Benchmark for Autonomous Driving

Add code
May 24, 2025
Figure 1 for S2R-Bench: A Sim-to-Real Evaluation Benchmark for Autonomous Driving
Figure 2 for S2R-Bench: A Sim-to-Real Evaluation Benchmark for Autonomous Driving
Figure 3 for S2R-Bench: A Sim-to-Real Evaluation Benchmark for Autonomous Driving
Figure 4 for S2R-Bench: A Sim-to-Real Evaluation Benchmark for Autonomous Driving
Viaarxiv icon