Abstract:Detecting anomalies in images and video is an essential task for multiple real-world problems, including industrial inspection, computer-assisted diagnosis, and environmental monitoring. Anomaly detection is typically formulated as a one-class classification problem, where the training data consists solely of nominal values, leaving methods built on this assumption susceptible to training label noise. We present a dataset folding method that transforms an arbitrary one-class classifier-based anomaly detector into a fully unsupervised method. This is achieved by making a set of key weak assumptions: that anomalies are uncommon in the training dataset and generally heterogeneous. These assumptions enable us to utilize multiple independently trained instances of a one-class classifier to filter the training dataset for anomalies. This transformation requires no modifications to the underlying anomaly detector; the only changes are algorithmically selected data subsets used for training. We demonstrate that our method can transform a wide variety of one-class classifier anomaly detectors for both images and videos into unsupervised ones. Our method creates the first unsupervised logical anomaly detectors by transforming existing methods. We also demonstrate that our method achieves state-of-the-art performance for unsupervised anomaly detection on the MVTec AD, ViSA, and MVTec Loco AD datasets. As improvements to one-class classifiers are made, our method directly transfers those improvements to the unsupervised domain, linking the domains.
Abstract:Recent advancements in cabled ocean observatories have increased the quality and prevalence of underwater videos; this data enables the extraction of high-level biologically relevant information such as species' behaviours. Despite this increase in capability, most modern methods for the automatic interpretation of underwater videos focus only on the detection and counting organisms. We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos. TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder. TempNet also presents temporal attention during spatial encoding as well as Wavelet Down-Sampling pre-processing to improve model accuracy. Although our system is designed for applications to diverse fish behaviours (i.e, is generic), we demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events. We compare the proposed approach with a state-of-the-art end-to-end video detection method (ReMotENet) and a hybrid method previously offered exclusively for the detection of sablefish's startle events in videos from an existing dataset. Results show that our novel method comfortably outperforms the comparison baselines in multiple metrics, reaching a per-clip accuracy and precision of 80% and 0.81, respectively. This represents a relative improvement of 31% in accuracy and 27% in precision over the compared methods using this dataset. Our computational pipeline is also highly efficient, as it can process each 4-second video clip in only 38ms. Furthermore, since it does not employ features specific to sablefish startle events, our system can be easily extended to other behaviours in future works.




Abstract:Chest radiographs are used for the diagnosis of multiple critical illnesses (e.g., Pneumonia, heart failure, lung cancer), for this reason, systems for the automatic or semi-automatic analysis of these data are of particular interest. An efficient analysis of large amounts of chest radiographs can aid physicians and radiologists, ultimately allowing for better medical care of lung-, heart- and chest-related conditions. We propose a novel Discrete Wavelet Transform (DWT)-based method for the efficient identification and encoding of visual information that is typically lost in the down-sampling of high-resolution radiographs, a common step in computer-aided diagnostic pipelines. Our proposed approach requires only slight modifications to the input of existing state-of-the-art Convolutional Neural Networks (CNNs), making it easily applicable to existing image classification frameworks. We show that the extra high-frequency components offered by our method increased the classification performance of several CNNs in benchmarks employing the NIH Chest-8 and ImageNet-2017 datasets. Based on our results we hypothesize that providing frequency-specific coefficients allows the CNNs to specialize in the identification of structures that are particular to a frequency band, ultimately increasing classification performance, without an increase in computational load. The implementation of our work is available at github.com/DeclanMcIntosh/LeGallCuda.




Abstract:Global warming is predicted to profoundly impact ocean ecosystems. Fish behavior is an important indicator of changes in such marine environments. Thus, the automatic identification of key fish behavior in videos represents a much needed tool for marine researchers, enabling them to study climate change-related phenomena. We offer a dataset of sablefish (Anoplopoma fimbria) startle behaviors in underwater videos, and investigate the use of deep learning (DL) methods for behavior detection on it. Our proposed detection system identifies fish instances using DL-based frameworks, determines trajectory tracks, derives novel behavior-specific features, and employs Long Short-Term Memory (LSTM) networks to identify startle behavior in sablefish. Its performance is studied by comparing it with a state-of-the-art DL-based video event detector.