Abstract:While safety alignment and guardrails help large language models (LLMs) avoid harmful outputs, they can also induce overrefusal, i.e., unwarranted rejection of benign queries that merely appear risky. We present DDOR (Delta Debugging for OverRefusal), a fully automated and explainable framework for overrefusal testing and repair in a black-box setting, where only model inputs and outputs are accessible and internal safety mechanisms remain opaque. DDOR applies delta debugging to localize minimal refusal-triggering fragments (mRTFs) that provide phrase-level, explainable evidence for why a refusal occurs. Conditioned on these mRTFs, DDOR generates diverse, context-rich prompts and performs multi-oracle validation to filter intrinsically unsafe or ambiguous cases, producing scalable and model-specific overrefusal test suites (approximately 1K cases per model). Beyond evaluation, we further leverage localized mRTFs to perform targeted prompt repair, substantially reducing overrefusal while preserving the original intent and maintaining safety on genuinely harmful inputs. Overall, DDOR offers a practical end-to-end solution to both evaluate and mitigate overrefusal, improving LLM usability without sacrificing safety.




Abstract:The development of versatile robots capable of traversing challenging and irregular environments is of increasing interest in the field of robotics, and metameric robots have been identified as a promising solution due to their slender, deformable bodies. Inspired by the effective locomotion of earthworms, earthworm-like robots capable of both rectilinear and planar locomotion have been designed and prototyped. While much research has focused on developing kinematic models to describe the planar locomotion of earthworm-like robots, the authors argue that the development of dynamic models is critical to improving the accuracy and efficiency of these robots. A comprehensive analysis of the dynamics of a metameric earthworm-like robot capable of planar motion is presented in this work. The model takes into account the complex interactions between the robot's deformable body and the forces acting on it and draws on the methods previously used to develop mathematical models of snake-like robots. The proposed model represents a significant advancement in the field of metameric robotics and has the potential to enhance the performance of earthworm-like robots in a variety of challenging environments, such as underground pipes and tunnels, and serves as a foundation for future research into the dynamics of soft-bodied robots.