Picture for Benedikt Stroebl

Benedikt Stroebl

Dynamic Risk Assessments for Offensive Cybersecurity Agents

Add code
May 23, 2025
Viaarxiv icon

Localized Cultural Knowledge is Conserved and Controllable in Large Language Models

Add code
Apr 14, 2025
Viaarxiv icon

Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers

Add code
Dec 02, 2024
Figure 1 for Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 2 for Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 3 for Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 4 for Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Viaarxiv icon

Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers

Add code
Nov 26, 2024
Figure 1 for Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 2 for Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 3 for Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 4 for Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers
Viaarxiv icon

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

Add code
Sep 17, 2024
Figure 1 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Figure 2 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Figure 3 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Figure 4 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Viaarxiv icon

AI Agents That Matter

Add code
Jul 01, 2024
Viaarxiv icon