Picture for Long Phan

Long Phan

Improving Alignment and Robustness with Circuit Breakers

Add code
Jun 10, 2024
Viaarxiv icon

Improving Alignment and Robustness with Short Circuiting

Add code
Jun 06, 2024
Viaarxiv icon

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Add code
Mar 06, 2024
Figure 1 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 2 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 3 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 4 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Viaarxiv icon

Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation

Add code
Feb 21, 2024
Viaarxiv icon

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Add code
Feb 06, 2024
Figure 1 for HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Figure 2 for HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Figure 3 for HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Figure 4 for HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Viaarxiv icon

Representation Engineering: A Top-Down Approach to AI Transparency

Add code
Oct 10, 2023
Figure 1 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 2 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 3 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 4 for Representation Engineering: A Top-Down Approach to AI Transparency
Viaarxiv icon

Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages

Add code
Mar 30, 2023
Figure 1 for Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages
Figure 2 for Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages
Figure 3 for Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages
Figure 4 for Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages
Viaarxiv icon

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Add code
Mar 07, 2023
Figure 1 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 2 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 3 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 4 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Viaarxiv icon

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Add code
Nov 09, 2022
Viaarxiv icon

MTet: Multi-domain Translation for English and Vietnamese

Add code
Oct 19, 2022
Figure 1 for MTet: Multi-domain Translation for English and Vietnamese
Figure 2 for MTet: Multi-domain Translation for English and Vietnamese
Figure 3 for MTet: Multi-domain Translation for English and Vietnamese
Figure 4 for MTet: Multi-domain Translation for English and Vietnamese
Viaarxiv icon