Picture for Weiqi Wang

Weiqi Wang

BlindU: Blind Machine Unlearning without Revealing Erasing Data

Add code
Jan 12, 2026
Viaarxiv icon

Beyond Accuracy: A Geometric Stability Analysis of Large Language Models in Chess Evaluation

Add code
Dec 17, 2025
Figure 1 for Beyond Accuracy: A Geometric Stability Analysis of Large Language Models in Chess Evaluation
Figure 2 for Beyond Accuracy: A Geometric Stability Analysis of Large Language Models in Chess Evaluation
Figure 3 for Beyond Accuracy: A Geometric Stability Analysis of Large Language Models in Chess Evaluation
Figure 4 for Beyond Accuracy: A Geometric Stability Analysis of Large Language Models in Chess Evaluation
Viaarxiv icon

The Cognitive Bandwidth Bottleneck: Shifting Long-Horizon Agent from Planning with Actions to Planning with Schemas

Add code
Oct 08, 2025
Viaarxiv icon

NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents

Add code
Oct 08, 2025
Viaarxiv icon

Structuring the Unstructured: A Systematic Review of Text-to-Structure Generation for Agentic AI with a Universal Evaluation Framework

Add code
Aug 17, 2025
Figure 1 for Structuring the Unstructured: A Systematic Review of Text-to-Structure Generation for Agentic AI with a Universal Evaluation Framework
Figure 2 for Structuring the Unstructured: A Systematic Review of Text-to-Structure Generation for Agentic AI with a Universal Evaluation Framework
Figure 3 for Structuring the Unstructured: A Systematic Review of Text-to-Structure Generation for Agentic AI with a Universal Evaluation Framework
Figure 4 for Structuring the Unstructured: A Systematic Review of Text-to-Structure Generation for Agentic AI with a Universal Evaluation Framework
Viaarxiv icon

Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty

Add code
Aug 12, 2025
Figure 1 for Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty
Figure 2 for Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty
Figure 3 for Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty
Figure 4 for Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty
Viaarxiv icon

SessionIntentBench: A Multi-task Inter-session Intention-shift Modeling Benchmark for E-commerce Customer Behavior Understanding

Add code
Jul 27, 2025
Viaarxiv icon

Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?

Add code
May 30, 2025
Figure 1 for Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?
Figure 2 for Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?
Figure 3 for Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?
Figure 4 for Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?
Viaarxiv icon

INFERENCEDYNAMICS: Efficient Routing Across LLMs through Structured Capability and Knowledge Profiling

Add code
May 22, 2025
Figure 1 for INFERENCEDYNAMICS: Efficient Routing Across LLMs through Structured Capability and Knowledge Profiling
Figure 2 for INFERENCEDYNAMICS: Efficient Routing Across LLMs through Structured Capability and Knowledge Profiling
Figure 3 for INFERENCEDYNAMICS: Efficient Routing Across LLMs through Structured Capability and Knowledge Profiling
Figure 4 for INFERENCEDYNAMICS: Efficient Routing Across LLMs through Structured Capability and Knowledge Profiling
Viaarxiv icon

EcomScriptBench: A Multi-task Benchmark for E-commerce Script Planning via Step-wise Intention-Driven Product Association

Add code
May 21, 2025
Viaarxiv icon