Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Cristina Pinneri

Guarding the Meaning: Self-Supervised Training for Semantic Robustness in Guard Models

Nov 06, 2025

Cristina Pinneri, Christos Louizos

Figure 1 for Guarding the Meaning: Self-Supervised Training for Semantic Robustness in Guard Models

Figure 2 for Guarding the Meaning: Self-Supervised Training for Semantic Robustness in Guard Models

Figure 3 for Guarding the Meaning: Self-Supervised Training for Semantic Robustness in Guard Models

Figure 4 for Guarding the Meaning: Self-Supervised Training for Semantic Robustness in Guard Models

Abstract:Guard models are a critical component of LLM safety, but their sensitivity to superficial linguistic variations remains a key vulnerability. We show that even meaning-preserving paraphrases can cause large fluctuations in safety scores, revealing a lack of semantic grounding. To address this, we introduce a practical, self-supervised framework for improving the semantic robustness of guard models. Our method leverages paraphrase sets to enforce prediction consistency using a novel, skew-aware aggregation strategy for robust target computation. Notably, we find that standard aggregation methods like mean and median can degrade safety, underscoring the need for skew-aware alternatives. We analyze six open-source guard models and show that our approach reduces semantic variability across paraphrases by ~58%, improves benchmark accuracy by ~2.5% on average, and generalizes to unseen stylistic variations. Intriguingly, we discover a bidirectional relationship between model calibration and consistency: our robustness training improves calibration by up to 40%, revealing a fundamental connection between these properties. These results highlight the value of treating semantic consistency as a first-class training objective and provide a scalable recipe for building more reliable guard models.

Via

Access Paper or Ask Questions

Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning

Sep 14, 2023

Cristina Pinneri, Sarah Bechtle, Markus Wulfmeier, Arunkumar Byravan, Jingwei Zhang, William F. Whitney, Martin Riedmiller

Abstract:We present a novel approach to address the challenge of generalization in offline reinforcement learning (RL), where the agent learns from a fixed dataset without any additional interaction with the environment. Specifically, we aim to improve the agent's ability to generalize to out-of-distribution goals. To achieve this, we propose to learn a dynamics model and check if it is equivariant with respect to a fixed type of transformation, namely translations in the state space. We then use an entropy regularizer to increase the equivariant set and augment the dataset with the resulting transformed samples. Finally, we learn a new policy offline based on the augmented dataset, with an off-the-shelf offline RL algorithm. Our experimental results demonstrate that our approach can greatly improve the test performance of the policy on the considered environments.

Via

Access Paper or Ask Questions

Sample-efficient Cross-Entropy Method for Real-time Planning

Aug 14, 2020

Cristina Pinneri, Shambhuraj Sawant, Sebastian Blaes, Jan Achterhold, Joerg Stueckler, Michal Rolinek, Georg Martius

Figure 1 for Sample-efficient Cross-Entropy Method for Real-time Planning

Figure 2 for Sample-efficient Cross-Entropy Method for Real-time Planning

Figure 3 for Sample-efficient Cross-Entropy Method for Real-time Planning

Figure 4 for Sample-efficient Cross-Entropy Method for Real-time Planning

Abstract:Trajectory optimizers for model-based reinforcement learning, such as the Cross-Entropy Method (CEM), can yield compelling results even in high-dimensional control tasks and sparse-reward environments. However, their sampling inefficiency prevents them from being used for real-time planning and control. We propose an improved version of the CEM algorithm for fast planning, with novel additions including temporally-correlated actions and memory, requiring 2.7-22x less samples and yielding a performance increase of 1.2-10x in high-dimensional control problems.

Via

Access Paper or Ask Questions