Picture for Rongwu Xu

Rongwu Xu

Does Chain-of-Thought Reasoning Really Reduce Harmfulness from Jailbreaking?

Add code
May 23, 2025
Viaarxiv icon

LIFEBench: Evaluating Length Instruction Following in Large Language Models

Add code
May 22, 2025
Viaarxiv icon

AI Awareness

Add code
Apr 25, 2025
Viaarxiv icon

"Nuclear Deployed!": Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents

Add code
Feb 17, 2025
Viaarxiv icon

Long$^2$RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall

Add code
Oct 31, 2024
Viaarxiv icon

Sing it, Narrate it: Quality Musical Lyrics Translation

Add code
Oct 29, 2024
Viaarxiv icon

On the Role of Attention Heads in Large Language Model Safety

Add code
Oct 17, 2024
Figure 1 for On the Role of Attention Heads in Large Language Model Safety
Figure 2 for On the Role of Attention Heads in Large Language Model Safety
Figure 3 for On the Role of Attention Heads in Large Language Model Safety
Figure 4 for On the Role of Attention Heads in Large Language Model Safety
Viaarxiv icon

DebateQA: Evaluating Question Answering on Debatable Knowledge

Add code
Aug 02, 2024
Figure 1 for DebateQA: Evaluating Question Answering on Debatable Knowledge
Figure 2 for DebateQA: Evaluating Question Answering on Debatable Knowledge
Figure 3 for DebateQA: Evaluating Question Answering on Debatable Knowledge
Figure 4 for DebateQA: Evaluating Question Answering on Debatable Knowledge
Viaarxiv icon

Course-Correction: Safety Alignment Using Synthetic Preferences

Add code
Jul 23, 2024
Figure 1 for Course-Correction: Safety Alignment Using Synthetic Preferences
Figure 2 for Course-Correction: Safety Alignment Using Synthetic Preferences
Figure 3 for Course-Correction: Safety Alignment Using Synthetic Preferences
Figure 4 for Course-Correction: Safety Alignment Using Synthetic Preferences
Viaarxiv icon

Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias

Add code
Jul 22, 2024
Figure 1 for Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias
Figure 2 for Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias
Figure 3 for Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias
Figure 4 for Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias
Viaarxiv icon