Picture for Pepa Atanasova

Pepa Atanasova

University of Copenhagen, Denmark

Understanding helpfulness and harmless tension in reward models

Add code
Jun 11, 2026
Viaarxiv icon

Investigating the Interplay between Contextual and Parametric Chain-of-Thought Faithfulness under Optimization

Add code
May 24, 2026
Viaarxiv icon

Can Large Language Models Still Explain Themselves? Investigating the Impact of Quantization on Self-Explanations

Add code
Jan 01, 2026
Viaarxiv icon

Self-Critique and Refinement for Faithful Natural Language Explanations

Add code
May 28, 2025
Figure 1 for Self-Critique and Refinement for Faithful Natural Language Explanations
Figure 2 for Self-Critique and Refinement for Faithful Natural Language Explanations
Figure 3 for Self-Critique and Refinement for Faithful Natural Language Explanations
Figure 4 for Self-Critique and Refinement for Faithful Natural Language Explanations
Viaarxiv icon

A Reality Check on Context Utilisation for Retrieval-Augmented Generation

Add code
Dec 22, 2024
Figure 1 for A Reality Check on Context Utilisation for Retrieval-Augmented Generation
Figure 2 for A Reality Check on Context Utilisation for Retrieval-Augmented Generation
Figure 3 for A Reality Check on Context Utilisation for Retrieval-Augmented Generation
Figure 4 for A Reality Check on Context Utilisation for Retrieval-Augmented Generation
Viaarxiv icon

Graph-Guided Textual Explanation Generation Framework

Add code
Dec 16, 2024
Figure 1 for Graph-Guided Textual Explanation Generation Framework
Figure 2 for Graph-Guided Textual Explanation Generation Framework
Figure 3 for Graph-Guided Textual Explanation Generation Framework
Figure 4 for Graph-Guided Textual Explanation Generation Framework
Viaarxiv icon

From Internal Conflict to Contextual Adaptation of Language Models

Add code
Jul 24, 2024
Figure 1 for From Internal Conflict to Contextual Adaptation of Language Models
Figure 2 for From Internal Conflict to Contextual Adaptation of Language Models
Figure 3 for From Internal Conflict to Contextual Adaptation of Language Models
Figure 4 for From Internal Conflict to Contextual Adaptation of Language Models
Viaarxiv icon

A Unified Framework for Input Feature Attribution Analysis

Add code
Jun 21, 2024
Figure 1 for A Unified Framework for Input Feature Attribution Analysis
Figure 2 for A Unified Framework for Input Feature Attribution Analysis
Figure 3 for A Unified Framework for Input Feature Attribution Analysis
Figure 4 for A Unified Framework for Input Feature Attribution Analysis
Viaarxiv icon

Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods

Add code
Apr 29, 2024
Figure 1 for Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods
Figure 2 for Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods
Figure 3 for Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods
Figure 4 for Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods
Viaarxiv icon

Explaining Interactions Between Text Spans

Add code
Oct 20, 2023
Figure 1 for Explaining Interactions Between Text Spans
Figure 2 for Explaining Interactions Between Text Spans
Figure 3 for Explaining Interactions Between Text Spans
Figure 4 for Explaining Interactions Between Text Spans
Viaarxiv icon