Picture for Diogo Schwerz de Lucena

Diogo Schwerz de Lucena

Momentum Point-Perplexity Mechanics in Large Language Models

Add code
Aug 11, 2025
Viaarxiv icon

Towards Safe and Honest AI Agents with Neural Self-Other Overlap

Add code
Dec 20, 2024
Figure 1 for Towards Safe and Honest AI Agents with Neural Self-Other Overlap
Figure 2 for Towards Safe and Honest AI Agents with Neural Self-Other Overlap
Figure 3 for Towards Safe and Honest AI Agents with Neural Self-Other Overlap
Figure 4 for Towards Safe and Honest AI Agents with Neural Self-Other Overlap
Viaarxiv icon

Unexpected Benefits of Self-Modeling in Neural Systems

Add code
Jul 14, 2024
Figure 1 for Unexpected Benefits of Self-Modeling in Neural Systems
Figure 2 for Unexpected Benefits of Self-Modeling in Neural Systems
Figure 3 for Unexpected Benefits of Self-Modeling in Neural Systems
Figure 4 for Unexpected Benefits of Self-Modeling in Neural Systems
Viaarxiv icon

Rethinking harmless refusals when fine-tuning foundation models

Add code
Jun 27, 2024
Figure 1 for Rethinking harmless refusals when fine-tuning foundation models
Figure 2 for Rethinking harmless refusals when fine-tuning foundation models
Figure 3 for Rethinking harmless refusals when fine-tuning foundation models
Figure 4 for Rethinking harmless refusals when fine-tuning foundation models
Viaarxiv icon