A reasonable evaluation standard underlies construction of effective deep learning models. However, we find in experiments that the automatic crack detectors based on deep learning are obviously underestimated by the widely used mean Average Precision (mAP) standard. This paper presents a study on the evaluation standard. It is clarified that the random fractal of crack disables the mAP standard, because the strict box matching in mAP calculation is unreasonable for the fractal feature. As a solution, a fractal-available evaluation standard named CovEval is proposed to correct the underestimation in crack detection. In CovEval, a different matching process based on the idea of covering box matching is adopted for this issue. In detail, Cover Area rate (CAr) is designed as a covering overlap, and a multi-match strategy is employed to release the one-to-one matching restriction in mAP. Extended Recall (XR), Extended Precision (XP) and Extended F-score (Fext) are defined for scoring the crack detectors. In experiments using several common frameworks for object detection, models get much higher scores in crack detection according to CovEval, which matches better with the visual performance. Moreover, based on faster R-CNN framework, we present a case study to optimize a crack detector based on CovEval standard. Recall (XR) of our best model achieves an industrial-level at 95.8, which implies that with reasonable standard for evaluation, the methods for object detection are with great potential for automatic industrial inspection.
Deep clustering which adopts deep neural networks to obtain optimal representations for clustering has been widely studied recently. In this paper, we propose a novel deep image clustering framework to learn a category-style latent representation in which the category information is disentangled from image style and can be directly used as the cluster assignment. To achieve this goal, mutual information maximization is applied to embed relevant information in the latent representation. Moreover, augmentation-invariant loss is employed to disentangle the representation into category part and style part. Last but not least, a prior distribution is imposed on the latent representation to ensure the elements of the category vector can be used as the probabilities over clusters. Comprehensive experiments demonstrate that the proposed approach outperforms state-of-the-art methods significantly on five public datasets.
Mining a set of meaningful topics organized into a hierarchy is intuitively appealing since topic correlations are ubiquitous in massive text corpora. To account for potential hierarchical topic structures, hierarchical topic models generalize flat topic models by incorporating latent topic hierarchies into their generative modeling process. However, due to their purely unsupervised nature, the learned topic hierarchy often deviates from users' particular needs or interests. To guide the hierarchical topic discovery process with minimal user supervision, we propose a new task, Hierarchical Topic Mining, which takes a category tree described by category names only, and aims to mine a set of representative terms for each category from a text corpus to help a user comprehend his/her interested topics. We develop a novel joint tree and text embedding method along with a principled optimization procedure that allows simultaneous modeling of the category tree structure and the corpus generative process in the spherical space for effective category-representative term discovery. Our comprehensive experiments show that our model, named JoSH, mines a high-quality set of hierarchical topics with high efficiency and benefits weakly-supervised hierarchical text classification tasks.
To better understand early brain growth patterns in health and disorder, it is critical to accurately segment infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Deep learning-based methods have achieved state-of-the-art performance; however, one of major limitations is that the learning-based methods may suffer from the multi-site issue, that is, the models trained on a dataset from one site may not be applicable to the datasets acquired from other sites with different imaging protocols/scanners. To promote methodological development in the community, iSeg-2019 challenge (http://iseg2019.web.unc.edu) provides a set of 6-month infant subjects from multiple sites with different protocols/scanners for the participating methods. Training/validation subjects are from UNC (MAP) and testing subjects are from UNC/UMN (BCP), Stanford University, and Emory University. By the time of writing, there are 30 automatic segmentation methods participating in iSeg-2019. We review the 8 top-ranked teams by detailing their pipelines/implementations, presenting experimental results and evaluating performance in terms of the whole brain, regions of interest, and gyral landmark curves. We also discuss their limitations and possible future directions for the multi-site issue. We hope that the multi-site dataset in iSeg-2019 and this review article will attract more researchers on the multi-site issue.
Task allocation has been a well studied problem. In most prior problem formulations, it is assumed that each task is associated with a unique set of resource requirements. In the scope of multi-robot task allocation problem, these requirements can be satisfied by a coalition of robots. In this paper, we introduce a more general formulation of multi-robot task allocation problem that allows more than one option for specifying the set of task requirements--satisfying any one of the options will satisfy the task. We referred to this new problem as the multi-robot task allocation problem with task variants. First, we theoretically show that this extension fortunately does not impact the complexity class, which is still NP-complete. For solution methods, we adapt two previous greedy methods for the task allocation problem without task variants to solve this new problem and analyze their effectiveness. In particular, we "flatten" the new problem to the problem without task variants, modify the previous methods to solve the flattened problem, and prove that the bounds still hold. Finally, we thoroughly evaluate these two methods along with a random baseline to demonstrate their efficacy for the new problem.
One simplifying assumption made in distributed robot systems is that the robots are single-tasking: each robot operates on a single task at any time. While such a sanguine assumption is innocent to make in situations with sufficient resources so that the robots can operate independently, it becomes impractical when they must share their capabilities. In this paper, we consider multi-tasking robots with multi-robot tasks. Given a set of tasks, each achievable by a coalition of robots, our approach allows the coalitions to overlap and task synergies to be exploited by reasoning about the physical constraints that can be synergistically satisfied for achieving the tasks. The key contribution of this work is a general and flexible framework to achieve this ability for multi-robot systems in resource-constrained situations to extend their capabilities. The proposed approach is built on the information invariant theory, which specifies the interactions between information requirements. In our work, we map physical constraints to information requirements, thereby allowing task synergies to be identified via the information invariant framework. We show that our algorithm is sound and complete under a problem setting with multi-tasking robots. Simulation results show its effectiveness under resource-constrained situations and in handling challenging situations in a multi-UAV simulator.
Visible images have been widely used for indoor motion estimation. Thermal images, in contrast, are more challenging to be used in motion estimation since they typically have lower resolution, less texture, and more noise. In this paper, a novel dataset for evaluating the performance of multi-spectral motion estimation systems is presented. The dataset includes both multi-spectral and dense depth images with accurate ground-truth camera poses provided by a motion capture system. All the sequences are recorded from a handheld multi-spectral device, which consists of a standard visible-light camera, a long-wave infrared camera, and a depth camera. The multi-spectral images, including both color and thermal images in full sensor resolution (640 $\times$ 480), are obtained from the hardware-synchronized standard and long-wave infrared camera at 32Hz. The depth images are captured by a Microsoft Kinect2 and can have benefits for learning cross-modalities stereo matching. In addition to the sequences with bright illumination, the dataset also contains scenes with dim or varying illumination. The full dataset, including both raw data and calibration data with detailed specifications of data format, is publicly available.
Deep learning based methods have been dominating the text recognition tasks in different and multilingual scenarios. The offline handwritten Chinese text recognition (HCTR) is one of the most challenging tasks because it involves thousands of characters, variant writing styles and complex data collection process. Recently, the recurrent-free architectures for text recognition appears to be competitive as its highly parallelism and comparable results. In this paper, we build the models using only the convolutional neural networks and use CTC as the loss function. To reduce the overfitting, we apply dropout after each max-pooling layer and with extreme high rate on the last one before the linear layer. The CASIA-HWDB database is selected to tune and evaluate the proposed models. With the existing text samples as templates, we randomly choose isolated character samples to synthesis more text samples for training. We finally achieve 6.81% character error rate (CER) on the ICDAR 2013 competition set, which is the best published result without language model correction.
The current pandemic, caused by the outbreak of a novel coronavirus (COVID-19) in December 2019, has led to a global emergency that has significantly impacted economies, healthcare systems and personal wellbeing all around the world. Controlling the rapidly evolving disease requires highly sensitive and specific diagnostics. While real-time RT-PCR is the most commonly used, these can take up to 8 hours, and require significant effort from healthcare professionals. As such, there is a critical need for a quick and automatic diagnostic system. Diagnosis from chest CT images is a promising direction. However, current studies are limited by the lack of sufficient training samples, as acquiring annotated CT images is time-consuming. To this end, we propose a new deep learning algorithm for the automated diagnosis of COVID-19, which only requires a few samples for training. Specifically, we use contrastive learning to train an encoder which can capture expressive feature representations on large and publicly available lung datasets and adopt the prototypical network for classification. We validate the efficacy of the proposed model in comparison with other competing methods on two publicly available and annotated COVID-19 CT datasets. Our results demonstrate the superior performance of our model for the accurate diagnosis of COVID-19 based on chest CT images.