Abstract:Unsupervised anomaly detection is vital in industrial fields, with reconstruction-based methods favored for their simplicity and effectiveness. However, reconstruction methods often encounter an identical shortcut issue, where both normal and anomalous regions can be well reconstructed and fail to identify outliers. The severity of this problem increases with the complexity of the normal data distribution. Consequently, existing methods may exhibit excellent detection performance in a specific scenario, but their performance sharply declines when transferred to another scenario. This paper focuses on establishing a universal model applicable to anomaly detection tasks across different settings, termed as universal anomaly detection. In this work, we introduce a novel, straightforward yet efficient framework for universal anomaly detection: \uline{F}eature \uline{S}huffling and \uline{R}estoration (FSR), which can alleviate the identical shortcut issue across different settings. First and foremost, FSR employs multi-scale features with rich semantic information as reconstruction targets, rather than raw image pixels. Subsequently, these multi-scale features are partitioned into non-overlapping feature blocks, which are randomly shuffled and then restored to their original state using a restoration network. This simple paradigm encourages the model to focus more on global contextual information. Additionally, we introduce a novel concept, the shuffling rate, to regulate the complexity of the FSR task, thereby alleviating the identical shortcut across different settings. Furthermore, we provide theoretical explanations for the effectiveness of FSR framework from two perspectives: network structure and mutual information. Extensive experimental results validate the superiority and efficiency of the FSR framework across different settings.Code is available at https://github.com/luow23/FSR.




Abstract:This paper presents a novel framework, named Global-Local Correspondence Framework (GLCF), for visual anomaly detection with logical constraints. Visual anomaly detection has become an active research area in various real-world applications, such as industrial anomaly detection and medical disease diagnosis. However, most existing methods focus on identifying local structural degeneration anomalies and often fail to detect high-level functional anomalies that involve logical constraints. To address this issue, we propose a two-branch approach that consists of a local branch for detecting structural anomalies and a global branch for detecting logical anomalies. To facilitate local-global feature correspondence, we introduce a novel semantic bottleneck enabled by the visual Transformer. Moreover, we develop feature estimation networks for each branch separately to detect anomalies. Our proposed framework is validated using various benchmarks, including industrial datasets, Mvtec AD, Mvtec Loco AD, and the Retinal-OCT medical dataset. Experimental results show that our method outperforms existing methods, particularly in detecting logical anomalies.