Conventional unsupervised anomaly detection (UAD) methods build separate models for each object category. Recent studies have proposed to train a unified model for multiple classes, namely model-unified UAD. However, such methods still implement the unified model separately on each class during inference with respective anomaly decision thresholds, which hinders their application when the image categories are entirely unavailable. In this work, we present a simple yet powerful method to address multi-class anomaly detection without any class information, namely \textit{absolute-unified} UAD. We target the crux of prior works in this challenging setting: different objects have mismatched anomaly score distributions. We propose Class-Agnostic Distribution Alignment (CADA) to align the mismatched score distribution of each implicit class without knowing class information, which enables unified anomaly detection for all classes and samples. The essence of CADA is to predict each class's score distribution of normal samples given any image, normal or anomalous, of this class. As a general component, CADA can activate the potential of nearly all UAD methods under absolute-unified setting. Our approach is extensively evaluated under the proposed setting on two popular UAD benchmark datasets, MVTec AD and VisA, where we exceed previous state-of-the-art by a large margin.
Most advanced unsupervised anomaly detection (UAD) methods rely on modeling feature representations of frozen encoder networks pre-trained on large-scale datasets, e.g. ImageNet. However, the features extracted from the encoders that are borrowed from natural image domains coincide little with the features required in the target UAD domain, such as industrial inspection and medical imaging. In this paper, we propose a novel epistemic UAD method, namely ReContrast, which optimizes the entire network to reduce biases towards the pre-trained image domain and orients the network in the target domain. We start with a feature reconstruction approach that detects anomalies from errors. Essentially, the elements of contrastive learning are elegantly embedded in feature reconstruction to prevent the network from training instability, pattern collapse, and identical shortcut, while simultaneously optimizing both the encoder and decoder on the target domain. To demonstrate our transfer ability on various image domains, we conduct extensive experiments across two popular industrial defect detection benchmarks and three medical image UAD tasks, which shows our superiority over current state-of-the-art methods.
Efficient automatic segmentation of multi-level (i.e. main and branch) pulmonary arteries (PA) in CTPA images plays a significant role in clinical applications. However, most existing methods concentrate only on main PA or branch PA segmentation separately and ignore segmentation efficiency. Besides, there is no public large-scale dataset focused on PA segmentation, which makes it highly challenging to compare the different methods. To benchmark multi-level PA segmentation algorithms, we organized the first \textbf{P}ulmonary \textbf{AR}tery \textbf{SE}gmentation (PARSE) challenge. On the one hand, we focus on both the main PA and the branch PA segmentation. On the other hand, for better clinical application, we assign the same score weight to segmentation efficiency (mainly running time and GPU memory consumption during inference) while ensuring PA segmentation accuracy. We present a summary of the top algorithms and offer some suggestions for efficient and accurate multi-level PA automatic segmentation. We provide the PARSE challenge as open-access for the community to benchmark future algorithm developments at \url{https://parse2022.grand-challenge.org/Parse2022/}.
Color fundus photography and Optical Coherence Tomography (OCT) are the two most cost-effective tools for glaucoma screening. Both two modalities of images have prominent biomarkers to indicate glaucoma suspected. Clinically, it is often recommended to take both of the screenings for a more accurate and reliable diagnosis. However, although numerous algorithms are proposed based on fundus images or OCT volumes in computer-aided diagnosis, there are still few methods leveraging both of the modalities for the glaucoma assessment. Inspired by the success of Retinal Fundus Glaucoma Challenge (REFUGE) we held previously, we set up the Glaucoma grAding from Multi-Modality imAges (GAMMA) Challenge to encourage the development of fundus \& OCT-based glaucoma grading. The primary task of the challenge is to grade glaucoma from both the 2D fundus images and 3D OCT scanning volumes. As part of GAMMA, we have publicly released a glaucoma annotated dataset with both 2D fundus color photography and 3D OCT volumes, which is the first multi-modality dataset for glaucoma grading. In addition, an evaluation framework is also established to evaluate the performance of the submitted methods. During the challenge, 1272 results were submitted, and finally, top-10 teams were selected to the final stage. We analysis their results and summarize their methods in the paper. Since all these teams submitted their source code in the challenge, a detailed ablation study is also conducted to verify the effectiveness of the particular modules proposed. We find many of the proposed techniques are practical for the clinical diagnosis of glaucoma. As the first in-depth study of fundus \& OCT multi-modality glaucoma grading, we believe the GAMMA Challenge will be an essential starting point for future research.
Calcaneus is the largest tarsal bone to withstand the daily stresses of weight bearing. The calcaneal fracture is the most common type in the tarsal bone fractures. After a fracture is suspected, plain radiographs should be taken first. Bohler's Angle (BA) and Critical Angle of Gissane (CAG), measured by four anatomic landmarks in lateral foot radiograph, can aid operative restoration of the fractured calcaneus and fracture diagnosis and assessment. The aim of this study is to develop a system to automatically locate four anatomic landmarks and measure BA and CAG for fracture assessment. To solved the problem of fickle rotation of calcaneus, we proposed a coarse-to-fine Rotation-Invariant Regression-Voting (RIRV) landmark detection method based on Supported Vector Regression (SVR) and Scale Invariant Feature Transform (SIFT) patch descriptor. By implementing a novel normalization approach to convert displacements into coordinates of oriented feature patches, our method is explicit rotation-invariance comparing with traditional regressive method. A multi-stream CNN structure with multi-region input is designed to screen calcaneus fracture. The input ROIs of multi-stream CNN are normalized by detected landmarks to uniform view, orientation and scale. The advantage of our approach is the usage of landmarks using prior knowledge to normalize the inputs of CNN so as to improve the efficiency of CNN. Experiments show that our CNN can accurately identify the fractures with sensitivity of 95.21% and specificity of 95.32%.
Calcaneus is the largest tarsal bone to withstand the daily stresses of weight bearing. The calcaneal fracture is the most common type in the tarsal bone fractures. After a fracture is suspected, plain radiographs should be taken first. Bohler's Angle (BA) and Critical Angle of Gissane (CAG), measured by four anatomic landmarks in lateral foot radiograph, can aid operative restoration of the fractured calcaneus and fracture diagnosis and assessment. The aim of this study is to develop a system to automatically locate four anatomic landmarks and measure BA and CAG for fracture assessment. To solved the problem of fickle rotation of calcaneus, we proposed a coarse-to-fine Rotation-Invariant Regression-Voting (RIRV) landmark detection method based on Supported Vector Regression (SVR) and Scale Invariant Feature Transform (SIFT) patch descriptor. By normalizing displacements into coordinates of oriented SIFT patches, our method is explicit rotation-invariance comparing to traditional regressive method. A multi-stream CNN structure with multi-region input is designed to screen calcaneus fracture. The input ROIs of multi-stream CNN are normalized by detected landmarks to uniform view, orientation and scale. Experiments show that our CNN can accurately identify the fractures with sensitivity of 96.81% and specificity of 90%.
Calcaneus is the largest tarsal bone designed to withstand the daily stresses of weight bearing. The calcaneal fracture is the most common type in the tarsal bone fractures. After a fracture is suspected, plain radiographs should be first requested. Bohler's Angle (BA) and Critical Angle of Gissane (CAG), measured by four anatomic landmarks in lateral foot radiograph, can aid operative restoration of the fractured calcaneus and fracture diagnosis and assessment. Due to the different postures of patient and operating conditions of X-ray machine, the calcaneus in lateral foot radiograph may be subjected to great variance of rotation and scale. The views of the radiographs also vary from partial ankle to whole leg. The aim of this study is to develop a system to automatically locate four anatomic landmarks and calculate the BA and CAG for fracture assessment. We proposed a coarse-to-fine Rotation-Invariant Regression-Voting (RIRV) landmarks detection approach based on Supported Vector Regression (SVR) and Scale Invariant Feature Transform (SIFT) patch descriptor. The study also designed a multi-stream CNN structure with multi-region input which can accurately identify the fractures with sensitivity of 96.81% and specificity of 90%.
One-Shot Neural architecture search (NAS) attracts broad attention recently due to its capacity to reduce the computational hours through weight sharing. However, extensive experiments on several recent works show that there is no positive correlation between the validation accuracy with inherited weights from the supernet and the test accuracy after re-training for One-Shot NAS. Different from devising a controller to find the best performing architecture with inherited weights, this paper focuses on how to sample architectures to train the supernet to make it more predictive. A single-path supernet is adopted, where only a small part of weights are optimized in each step, to reduce the memory demand greatly. Furthermore, we abandon devising complicated reward based architecture sampling controller, and sample architectures to train supernet based on novelty search. An efficient novelty search method for NAS is devised in this paper, and extensive experiments demonstrate the effectiveness and efficiency of our novelty search based architecture sampling method. The best architecture obtained by our algorithm with the same search space achieves the state-of-the-art test error rate of 2.51\% on CIFAR-10 with only 7.5 hours search time in a single GPU, and a validation perplexity of 60.02 and a test perplexity of 57.36 on PTB. We also transfer these search cell structures to larger datasets ImageNet and WikiText-2, respectively.
Bayesian optimization (BO) has been broadly applied to computational expensive problems, but it is still challenging to extend BO to high dimensions. Existing works are usually under strict assumption of an additive or a linear embedding structure for objective functions. This paper directly introduces a supervised dimension reduction method, Sliced Inverse Regression (SIR), to high dimensional Bayesian optimization, which could effectively learn the intrinsic sub-structure of objective function during the optimization. Furthermore, a kernel trick is developed to reduce computational complexity and learn nonlinear subset of the unknowing function when applying SIR to extremely high dimensional BO. We present several computational benefits and derive theoretical regret bounds of our algorithm. Extensive experiments on synthetic examples and two real applications demonstrate the superiority of our algorithms for high dimensional Bayesian optimization.