Picture for Martin Pawelczyk

Martin Pawelczyk

Validity Threats for Foundation Model Research

Add code
Jun 03, 2026
Viaarxiv icon

Multi-Agent Systems are Mixtures of Experts: Who Becomes an Influencer?

Add code
May 25, 2026
Viaarxiv icon

Don't Trust Stubborn Neighbors: A Security Framework for Agentic Networks

Add code
Mar 16, 2026
Viaarxiv icon

Easy Data Unlearning Bench

Add code
Feb 18, 2026
Viaarxiv icon

Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models

Add code
Dec 31, 2024
Figure 1 for Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models
Figure 2 for Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models
Figure 3 for Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models
Figure 4 for Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models
Viaarxiv icon

Explaining the Model, Protecting Your Data: Revealing and Mitigating the Data Privacy Risks of Post-Hoc Model Explanations via Membership Inference

Add code
Jul 24, 2024
Figure 1 for Explaining the Model, Protecting Your Data: Revealing and Mitigating the Data Privacy Risks of Post-Hoc Model Explanations via Membership Inference
Figure 2 for Explaining the Model, Protecting Your Data: Revealing and Mitigating the Data Privacy Risks of Post-Hoc Model Explanations via Membership Inference
Figure 3 for Explaining the Model, Protecting Your Data: Revealing and Mitigating the Data Privacy Risks of Post-Hoc Model Explanations via Membership Inference
Figure 4 for Explaining the Model, Protecting Your Data: Revealing and Mitigating the Data Privacy Risks of Post-Hoc Model Explanations via Membership Inference
Viaarxiv icon

Machine Unlearning Fails to Remove Data Poisoning Attacks

Add code
Jun 25, 2024
Viaarxiv icon

Towards Non-Adversarial Algorithmic Recourse

Add code
Mar 15, 2024
Figure 1 for Towards Non-Adversarial Algorithmic Recourse
Figure 2 for Towards Non-Adversarial Algorithmic Recourse
Figure 3 for Towards Non-Adversarial Algorithmic Recourse
Figure 4 for Towards Non-Adversarial Algorithmic Recourse
Viaarxiv icon

In-Context Unlearning: Language Models as Few Shot Unlearners

Add code
Oct 12, 2023
Viaarxiv icon

Gaussian Membership Inference Privacy

Add code
Jun 12, 2023
Viaarxiv icon