Weakly supervised object localization (WSOL) relaxes the requirement of dense annotations for object localization by using image-level classification masks to supervise its learning process. However, current WSOL methods suffer from excessive activation of background locations and need post-processing to obtain the localization mask. This paper attributes these issues to the unawareness of background cues, and propose the background-aware classification activation map (B-CAM) to simultaneously learn localization scores of both object and background with only image-level labels. In our B-CAM, two image-level features, aggregated by pixel-level features of potential background and object locations, are used to purify the object feature from the object-related background and to represent the feature of the pure-background sample, respectively. Then based on these two features, both the object classifier and the background classifier are learned to determine the binary object localization mask. Our B-CAM can be trained in end-to-end manner based on a proposed stagger classification loss, which not only improves the objects localization but also suppresses the background activation. Experiments show that our B-CAM outperforms one-stage WSOL methods on the CUB-200, OpenImages and VOC2012 datasets.
This paper proposes a novel and efficient method to build a Computer-Aided Diagnoses (CAD) system for lung nodule detection based on Computed Tomography (CT). This task was treated as an Object Detection on Video (VID) problem by imitating how a radiologist reads CT scans. A lung nodule detector was trained to automatically learn nodule features from still images to detect lung nodule candidates with both high recall and accuracy. Unlike previous work which used 3-dimensional information around the nodule to reduce false positives, we propose two simple but efficient methods, Multi-slice propagation (MSP) and Motionless-guide suppression (MLGS), which analyze sequence information of CT scans to reduce false negatives and suppress false positives. We evaluated our method in open-source LUNA16 dataset which contains 888 CT scans, and obtained state-of-the-art result (Free-Response Receiver Operating Characteristic score of 0.892) with detection speed (end to end within 20 seconds per patient on a single NVidia GTX 1080) much higher than existing methods.