Picture for Bruce W. Lee

Bruce W. Lee

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Add code
Feb 12, 2025
Viaarxiv icon

Programming Refusal with Conditional Activation Steering

Add code
Sep 06, 2024
Figure 1 for Programming Refusal with Conditional Activation Steering
Figure 2 for Programming Refusal with Conditional Activation Steering
Figure 3 for Programming Refusal with Conditional Activation Steering
Figure 4 for Programming Refusal with Conditional Activation Steering
Viaarxiv icon

Measuring Visual Sycophancy in Multimodal Models

Add code
Aug 17, 2024
Viaarxiv icon

Language Models Show Stable Value Orientations Across Diverse Role-Plays

Add code
Aug 16, 2024
Viaarxiv icon

HyperCLOVA X Technical Report

Add code
Apr 13, 2024
Viaarxiv icon

Tasks That Language Models Don't Learn

Add code
Feb 17, 2024
Viaarxiv icon

Instruction Tuning with Human Curriculum

Add code
Oct 14, 2023
Viaarxiv icon

A Side-by-side Comparison of Transformers for English Implicit Discourse Relation Classification

Add code
Jul 07, 2023
Viaarxiv icon

Linguistic Properties of Truthful Response

Add code
May 25, 2023
Figure 1 for Linguistic Properties of Truthful Response
Figure 2 for Linguistic Properties of Truthful Response
Figure 3 for Linguistic Properties of Truthful Response
Figure 4 for Linguistic Properties of Truthful Response
Viaarxiv icon

LFTK: Handcrafted Features in Computational Linguistics

Add code
May 25, 2023
Viaarxiv icon