Picture for Boyi Wei

Boyi Wei

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

Add code
Nov 03, 2025
Viaarxiv icon

Scaling Latent Reasoning via Looped Language Models

Add code
Oct 29, 2025
Figure 1 for Scaling Latent Reasoning via Looped Language Models
Figure 2 for Scaling Latent Reasoning via Looped Language Models
Figure 3 for Scaling Latent Reasoning via Looped Language Models
Figure 4 for Scaling Latent Reasoning via Looped Language Models
Viaarxiv icon

Dynamic Risk Assessments for Offensive Cybersecurity Agents

Add code
May 23, 2025
Figure 1 for Dynamic Risk Assessments for Offensive Cybersecurity Agents
Figure 2 for Dynamic Risk Assessments for Offensive Cybersecurity Agents
Figure 3 for Dynamic Risk Assessments for Offensive Cybersecurity Agents
Figure 4 for Dynamic Risk Assessments for Offensive Cybersecurity Agents
Viaarxiv icon

On Evaluating the Durability of Safeguards for Open-Weight LLMs

Add code
Dec 10, 2024
Figure 1 for On Evaluating the Durability of Safeguards for Open-Weight LLMs
Figure 2 for On Evaluating the Durability of Safeguards for Open-Weight LLMs
Figure 3 for On Evaluating the Durability of Safeguards for Open-Weight LLMs
Figure 4 for On Evaluating the Durability of Safeguards for Open-Weight LLMs
Viaarxiv icon

An Adversarial Perspective on Machine Unlearning for AI Safety

Add code
Sep 26, 2024
Figure 1 for An Adversarial Perspective on Machine Unlearning for AI Safety
Figure 2 for An Adversarial Perspective on Machine Unlearning for AI Safety
Figure 3 for An Adversarial Perspective on Machine Unlearning for AI Safety
Figure 4 for An Adversarial Perspective on Machine Unlearning for AI Safety
Viaarxiv icon

Evaluating Copyright Takedown Methods for Language Models

Add code
Jun 26, 2024
Figure 1 for Evaluating Copyright Takedown Methods for Language Models
Figure 2 for Evaluating Copyright Takedown Methods for Language Models
Figure 3 for Evaluating Copyright Takedown Methods for Language Models
Figure 4 for Evaluating Copyright Takedown Methods for Language Models
Viaarxiv icon

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

Add code
Jun 20, 2024
Figure 1 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Figure 2 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Figure 3 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Figure 4 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Viaarxiv icon

A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras

Add code
Jun 13, 2024
Figure 1 for A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras
Figure 2 for A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras
Figure 3 for A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras
Figure 4 for A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras
Viaarxiv icon

AI Risk Management Should Incorporate Both Safety and Security

Add code
May 29, 2024
Figure 1 for AI Risk Management Should Incorporate Both Safety and Security
Viaarxiv icon

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Add code
Feb 07, 2024
Figure 1 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Figure 2 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Figure 3 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Figure 4 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Viaarxiv icon