Picture for Govind Pimpale

Govind Pimpale

Large Language Models Often Know When They Are Being Evaluated

Add code
May 28, 2025
Figure 1 for Large Language Models Often Know When They Are Being Evaluated
Figure 2 for Large Language Models Often Know When They Are Being Evaluated
Figure 3 for Large Language Models Often Know When They Are Being Evaluated
Figure 4 for Large Language Models Often Know When They Are Being Evaluated
Viaarxiv icon

Forecasting Frontier Language Model Agent Capabilities

Add code
Feb 21, 2025
Figure 1 for Forecasting Frontier Language Model Agent Capabilities
Figure 2 for Forecasting Frontier Language Model Agent Capabilities
Figure 3 for Forecasting Frontier Language Model Agent Capabilities
Figure 4 for Forecasting Frontier Language Model Agent Capabilities
Viaarxiv icon

Applying Refusal-Vector Ablation to Llama 3.1 70B Agents

Add code
Oct 08, 2024
Figure 1 for Applying Refusal-Vector Ablation to Llama 3.1 70B Agents
Figure 2 for Applying Refusal-Vector Ablation to Llama 3.1 70B Agents
Figure 3 for Applying Refusal-Vector Ablation to Llama 3.1 70B Agents
Figure 4 for Applying Refusal-Vector Ablation to Llama 3.1 70B Agents
Viaarxiv icon

Analyzing Probabilistic Methods for Evaluating Agent Capabilities

Add code
Sep 24, 2024
Figure 1 for Analyzing Probabilistic Methods for Evaluating Agent Capabilities
Figure 2 for Analyzing Probabilistic Methods for Evaluating Agent Capabilities
Viaarxiv icon