Picture for Boyi Wei

Boyi Wei

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

Add code
Nov 03, 2025
Viaarxiv icon

Scaling Latent Reasoning via Looped Language Models

Add code
Oct 29, 2025
Figure 1 for Scaling Latent Reasoning via Looped Language Models
Figure 2 for Scaling Latent Reasoning via Looped Language Models
Figure 3 for Scaling Latent Reasoning via Looped Language Models
Figure 4 for Scaling Latent Reasoning via Looped Language Models
Viaarxiv icon

Dynamic Risk Assessments for Offensive Cybersecurity Agents

Add code
May 23, 2025
Figure 1 for Dynamic Risk Assessments for Offensive Cybersecurity Agents
Figure 2 for Dynamic Risk Assessments for Offensive Cybersecurity Agents
Figure 3 for Dynamic Risk Assessments for Offensive Cybersecurity Agents
Figure 4 for Dynamic Risk Assessments for Offensive Cybersecurity Agents
Viaarxiv icon

On Evaluating the Durability of Safeguards for Open-Weight LLMs

Add code
Dec 10, 2024
Figure 1 for On Evaluating the Durability of Safeguards for Open-Weight LLMs
Figure 2 for On Evaluating the Durability of Safeguards for Open-Weight LLMs
Figure 3 for On Evaluating the Durability of Safeguards for Open-Weight LLMs
Figure 4 for On Evaluating the Durability of Safeguards for Open-Weight LLMs
Viaarxiv icon

An Adversarial Perspective on Machine Unlearning for AI Safety

Add code
Sep 26, 2024
Viaarxiv icon

Evaluating Copyright Takedown Methods for Language Models

Add code
Jun 26, 2024
Figure 1 for Evaluating Copyright Takedown Methods for Language Models
Figure 2 for Evaluating Copyright Takedown Methods for Language Models
Figure 3 for Evaluating Copyright Takedown Methods for Language Models
Figure 4 for Evaluating Copyright Takedown Methods for Language Models
Viaarxiv icon

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

Add code
Jun 20, 2024
Figure 1 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Figure 2 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Figure 3 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Figure 4 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Viaarxiv icon

A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras

Add code
Jun 13, 2024
Figure 1 for A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras
Figure 2 for A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras
Figure 3 for A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras
Figure 4 for A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras
Viaarxiv icon

AI Risk Management Should Incorporate Both Safety and Security

Add code
May 29, 2024
Figure 1 for AI Risk Management Should Incorporate Both Safety and Security
Viaarxiv icon

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Add code
Feb 07, 2024
Figure 1 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Figure 2 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Figure 3 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Figure 4 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Viaarxiv icon