Picture for Jan Dubiński

Jan Dubiński

Negation Neglect: When models fail to learn negations in training

Add code
May 13, 2026
Viaarxiv icon

Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers

Add code
Apr 28, 2026
Viaarxiv icon

Conditioned Activation Transport for T2I Safety Steering

Add code
Mar 03, 2026
Viaarxiv icon

On Stealing Graph Neural Network Models

Add code
Nov 13, 2025
Viaarxiv icon

Backdoor Vectors: a Task Arithmetic View on Backdoor Attacks and Defenses

Add code
Oct 09, 2025
Viaarxiv icon

ExpertSim: Fast Particle Detector Simulation Using Mixture-of-Generative-Experts

Add code
Aug 28, 2025
Figure 1 for ExpertSim: Fast Particle Detector Simulation Using Mixture-of-Generative-Experts
Figure 2 for ExpertSim: Fast Particle Detector Simulation Using Mixture-of-Generative-Experts
Figure 3 for ExpertSim: Fast Particle Detector Simulation Using Mixture-of-Generative-Experts
Figure 4 for ExpertSim: Fast Particle Detector Simulation Using Mixture-of-Generative-Experts
Viaarxiv icon

Learning Graph Representation of Agent Diffusers

Add code
May 15, 2025
Figure 1 for Learning Graph Representation of Agent Diffusers
Figure 2 for Learning Graph Representation of Agent Diffusers
Figure 3 for Learning Graph Representation of Agent Diffusers
Figure 4 for Learning Graph Representation of Agent Diffusers
Viaarxiv icon

Maybe I Should Not Answer That, but... Do LLMs Understand The Safety of Their Inputs?

Add code
Feb 22, 2025
Figure 1 for Maybe I Should Not Answer That, but... Do LLMs Understand The Safety of Their Inputs?
Figure 2 for Maybe I Should Not Answer That, but... Do LLMs Understand The Safety of Their Inputs?
Figure 3 for Maybe I Should Not Answer That, but... Do LLMs Understand The Safety of Their Inputs?
Figure 4 for Maybe I Should Not Answer That, but... Do LLMs Understand The Safety of Their Inputs?
Viaarxiv icon

Privacy Attacks on Image AutoRegressive Models

Add code
Feb 04, 2025
Figure 1 for Privacy Attacks on Image AutoRegressive Models
Figure 2 for Privacy Attacks on Image AutoRegressive Models
Figure 3 for Privacy Attacks on Image AutoRegressive Models
Figure 4 for Privacy Attacks on Image AutoRegressive Models
Viaarxiv icon

CDI: Copyrighted Data Identification in Diffusion Models

Add code
Nov 19, 2024
Figure 1 for CDI: Copyrighted Data Identification in Diffusion Models
Figure 2 for CDI: Copyrighted Data Identification in Diffusion Models
Figure 3 for CDI: Copyrighted Data Identification in Diffusion Models
Figure 4 for CDI: Copyrighted Data Identification in Diffusion Models
Viaarxiv icon