Picture for Alexander Cheung

Alexander Cheung

CyberGym-E2E: Scalable Real-World Benchmark for AI Agents' End-to-End Cybersecurity Capabilities

Add code
Jun 03, 2026
Viaarxiv icon

Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding

Add code
Jan 30, 2025
Figure 1 for Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding
Figure 2 for Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding
Figure 3 for Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding
Figure 4 for Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding
Viaarxiv icon