For many applications of classifiers to medical images, a trustworthy label for each image can be difficult or expensive to obtain. In contrast, images without labels are more readily available. Two major research directions both promise that additional unlabeled data can improve classifier performance: self-supervised learning pretrains useful representations on unlabeled data only, then fine-tunes a classifier on these representations via the labeled set; semi-supervised learning directly trains a classifier on labeled and unlabeled data simultaneously. Recent methods from both directions have claimed significant gains on non-medical tasks, but do not systematically assess medical images and mostly compare only to methods in the same direction. This study contributes a carefully-designed benchmark to help answer a practitioner's key question: given a small labeled dataset and a limited budget of hours to spend on training, what gains from additional unlabeled images are possible and which methods best achieve them? Unlike previous benchmarks, ours uses realistic-sized validation sets to select hyperparameters, assesses runtime-performance tradeoffs, and bridges two research fields. By comparing 6 semi-supervised methods and 5 self-supervised methods to strong labeled-only baselines on 3 medical datasets with 30-1000 labels per class, we offer insights to resource-constrained, results-focused practitioners: MixMatch, SimCLR, and BYOL represent strong choices that were not surpassed by more recent methods. After much effort selecting hyperparameters on one dataset, we publish settings that enable strong methods to perform well on new medical tasks within a few hours, with further search over dozens of hours delivering modest additional gains.
Collaborative robots are being increasingly utilized in industrial production lines due to their efficiency and accuracy. However, the close proximity between humans and robots can pose safety risks due to the robot's high-speed movements and powerful forces. To address this, we developed a vision-based safety monitoring system that creates a 3D reconstruction of the collaborative scene. Our system records the human-robot interaction data in real-time and reproduce their virtual replicas in a simulator for offline analysis. The objective is to provide workers with a user-friendly visualization tool for reviewing performance and diagnosing failures, thereby enhancing safety in manufacturing settings.
Most generic object detectors are mainly built for standard object detection tasks such as COCO and PASCAL VOC. They might not work well and/or efficiently on tasks of other domains consisting of images that are visually different from standard datasets. To this end, many advances have been focused on adapting a general-purposed object detector with limited domain-specific designs. However, designing a successful task-specific detector requires extraneous manual experiments and parameter tuning through trial and error. In this paper, we first propose and examine a fully-automatic pipeline to design a fully-specialized detector (FSD) which mainly incorporates a neural-architectural-searched model by exploring ideal network structures over the backbone and task-specific head. On the DeepLesion dataset, extensive results show that FSD can achieve 3.1 mAP gain while using approximately 40% fewer parameters on binary lesion detection task and improved the mAP by around 10% on multi-type lesion detection task via our region-aware graph modeling compared with existing general-purposed medical lesion detection networks.
Aortic stenosis (AS) is a degenerative valve condition that causes substantial morbidity and mortality. This condition is under-diagnosed and under-treated. In clinical practice, AS is diagnosed with expert review of transthoracic echocardiography, which produces dozens of ultrasound images of the heart. Only some of these views show the aortic valve. To automate screening for AS, deep networks must learn to mimic a human expert's ability to identify views of the aortic valve then aggregate across these relevant images to produce a study-level diagnosis. We find previous approaches to AS detection yield insufficient accuracy due to relying on inflexible averages across images. We further find that off-the-shelf attention-based multiple instance learning (MIL) performs poorly. We contribute a new end-to-end MIL approach with two key methodological innovations. First, a supervised attention technique guides the learned attention mechanism to favor relevant views. Second, a novel self-supervised pretraining strategy applies contrastive learning on the representation of the whole study instead of individual images as commonly done in prior literature. Experiments on an open-access dataset and an external validation set show that our approach yields higher accuracy while reducing model size.
One fundamental limitation to the research of bird strike prevention is the lack of a large-scale dataset taken directly from real-world airports. Existing relevant datasets are either small in size or not dedicated for this purpose. To advance the research and practical solutions for bird strike prevention, in this paper, we present a large-scale challenging dataset AirBirds that consists of 118,312 time-series images, where a total of 409,967 bounding boxes of flying birds are manually, carefully annotated. The average size of all annotated instances is smaller than 10 pixels in 1920x1080 images. Images in the dataset are captured over 4 seasons of a whole year by a network of cameras deployed at a real-world airport, covering diverse bird species, lighting conditions and 13 meteorological scenarios. To the best of our knowledge, it is the first large-scale image dataset that directly collects flying birds in real-world airports for bird strike prevention. This dataset is publicly available at https://airbirdsdata.github.io/.
Semi-supervised learning (SSL) promises gains in accuracy compared to training classifiers on small labeled datasets by also training on many unlabeled images. In realistic applications like medical imaging, unlabeled sets will be collected for expediency and thus uncurated: possibly different from the labeled set in represented classes or class frequencies. Unfortunately, modern deep SSL often makes accuracy worse when given uncurated unlabeled sets. Recent remedies suggest filtering approaches that detect out-of-distribution unlabeled examples and then discard or downweight them. Instead, we view all unlabeled examples as potentially helpful. We introduce a procedure called Fix-A-Step that can improve heldout accuracy of common deep SSL methods despite lack of curation. The key innovations are augmentations of the labeled set inspired by all unlabeled data and a modification of gradient descent updates to prevent following the multi-task SSL loss from hurting labeled-set accuracy. Though our method is simpler than alternatives, we show consistent accuracy gains on CIFAR-10 and CIFAR-100 benchmarks across all tested levels of artificial contamination for the unlabeled sets. We further suggest a real medical benchmark for SSL: recognizing the view type of ultrasound images of the heart. Our method can learn from 353,500 truly uncurated unlabeled images to deliver gains that generalize across hospitals.
Impressive results in natural language processing (NLP) based on the Transformer neural network architecture have inspired researchers to explore viewing offline reinforcement learning (RL) as a generic sequence modeling problem. Recent works based on this paradigm have achieved state-of-the-art results in several of the mostly deterministic offline Atari and D4RL benchmarks. However, because these methods jointly model the states and actions as a single sequencing problem, they struggle to disentangle the effects of the policy and world dynamics on the return. Thus, in adversarial or stochastic environments, these methods lead to overly optimistic behavior that can be dangerous in safety-critical systems like autonomous driving. In this work, we propose a method that addresses this optimism bias by explicitly disentangling the policy and world models, which allows us at test time to search for policies that are robust to multiple possible futures in the environment. We demonstrate our method's superior performance on a variety of autonomous driving tasks in simulation.
A robot needs multiple interaction modes to robustly collaborate with a human in complicated industrial tasks. We develop a Coexistence-and-Cooperation (CoCo) human-robot collaboration system. Coexistence mode enables the robot to work with the human on different sub-tasks independently in a shared space. Cooperation mode enables the robot to follow human guidance and recover failures. A human intention tracking algorithm takes in both human and robot motion measurements as input and provides a switch on the interaction modes. We demonstrate the effectiveness of CoCo system in a use case analogous to a real world multi-step assembly task.
With the development of smart grid, the operation and control of power system is realized through power communication network, especially the power production and enterprise management business involve a large amount of sensitive information, and the requirements for data security and real-time transmission are gradually improved. In this paper, a new 9D complex chaotic system with quaternion is proposed for the encryption of smart grid data. Firstly, a new 9D complex chaotic system with quaternion is proposed, and its attractors, bifurcation diagram, complexity, and 0-1 test are analyzed. Secondly, the pseudo-random sequences are generated by the new chaotic system to encrypt power data. Finally, the proposed encryption algorithm is verifed with power data and images in the smart grid, which can ensure the encryption security and real-time. The verifcation results show that the proposed encryption scheme is technically feasible and available for power data and image encryption in smart grid.