Pedestrian detection in the crowd is a challenging task because of intra-class occlusion. More prior information is needed for the detector to be robust against it. Human head area is naturally a strong cue because of its stable appearance, visibility and relative location to body. Inspired by it, we adopt an extra branch to conduct semantic head detection in parallel with traditional body branch. Instead of manually labeling the head regions, we use weak annotations inferred directly from body boxes, which is named as `semantic head'. In this way, the head detection is formulated into using a special part of labeled box to detect the corresponding part of human body, which surprisingly improves the performance and robustness to occlusion. Moreover, the head-body alignment structure is explicitly explored by introducing Alignment Loss, which functions in a self-supervised manner. Based on these, we propose the head-body alignment net (HBAN) in this work, which aims to enhance pedestrian detection by fully utilizing the human head prior. Comprehensive evaluations are conducted to demonstrate the effectiveness of HBAN on CityPersons dataset.
Training a robust classifier and an accurate box regressor are difficult for occluded pedestrian detection. Traditionally adopted Intersection over Union (IoU) measurement does not consider the occluded region of the object and leads to improper training samples. To address such issue, a modification called visible IoU is proposed in this paper to explicitly incorporate the visible ratio in selecting samples. Then a newly designed box sign predictor is placed in parallel with box regressor to separately predict the moving direction of training samples. It leads to higher localization accuracy by introducing sign prediction loss during training and sign refining in testing. Following these novelties, we obtain state-of-the-art performance on CityPersons benchmark for occluded pedestrian detection.