Detecting small to tiny targets in infrared images is a challenging task in computer vision, especially when it comes to differentiating these targets from noisy or textured backgrounds. Traditional object detection methods such as YOLO struggle to detect tiny objects compared to segmentation neural networks, resulting in weaker performance when detecting small targets. To reduce the number of false alarms while maintaining a high detection rate, we introduce an $\textit{a contrario}$ decision criterion into the training of a YOLO detector. The latter takes advantage of the $\textit{unexpectedness}$ of small targets to discriminate them from complex backgrounds. Adding this statistical criterion to a YOLOv7-tiny bridges the performance gap between state-of-the-art segmentation methods for infrared small target detection and object detection networks. It also significantly increases the robustness of YOLO towards few-shot settings.
In this work, we use the Belief Function Theory which extends the probabilistic framework in order to provide uncertainty bounds to different categories of crowd density estimators. Our method allows us to compare the multi-scale performance of the estimators, and also to characterize their reliability for crowd monitoring applications requiring varying degrees of prudence.
This paper addresses the problem of head detection in crowded environments. Our detection is based entirely on the geometric consistency across cameras with overlapping fields of view, and no additional learning process is required. We propose a fully unsupervised method for inferring scene and camera geometry, in contrast to existing algorithms which require specific calibration procedures. Moreover, we avoid relying on the presence of body parts other than heads or on background subtraction, which have limited effectiveness under heavy clutter. We cast the head detection problem as a stereo MRF-based optimization of a dense pedestrian height map, and we introduce a constraint which aligns the height gradient according to the vertical vanishing point direction. We validate the method in an outdoor setting with varying pedestrian density levels. With only three views, our approach is able to detect simultaneously tens of heavily occluded pedestrians across a large, homogeneous area.
This paper proposes a method for computing efficiently the significance of a parametric pattern inside a binary image. On the one hand, a-contrario strategies avoid the user involvement for tuning detection thresholds, and allow one to account fairly for different pattern sizes. On the other hand, a-contrario criteria become intractable when the pattern complexity in terms of parametrization increases. In this work, we introduce a strategy which relies on the use of a cumulative space of reduced dimensionality, derived from the coupling of a classic (Hough) cumulative space with an integral histogram trick. This space allows us to store partial computations which are required by the a-contrario criterion, and to evaluate the significance with a lower computational cost than by following a straightforward approach. The method is illustrated on synthetic examples on patterns with various parametrizations up to five dimensions. In order to demonstrate how to apply this generic concept in a real scenario, we consider a difficult crack detection task in still images, which has been addressed in the literature with various local and global detection strategies. We model cracks as bounded segments, detected by the proposed a-contrario criterion, which allow us to introduce additional spatial constraints based on their relative alignment. On this application, the proposed strategy yields state-of the-art results, and underlines its potential for handling complex pattern detection tasks.
This paper introduces an innovative approach for handling 2D compound hypotheses within the Belief Function Theory framework. We propose a polygon-based generic rep- resentation which relies on polygon clipping operators. This approach allows us to account in the computational cost for the precision of the representation independently of the cardinality of the discernment frame. For the BBA combination and decision making, we propose efficient algorithms which rely on hashes for fast lookup, and on a topological ordering of the focal elements within a directed acyclic graph encoding their interconnections. Additionally, an implementation of the functionalities proposed in this paper is provided as an open source library. Experimental results on a pedestrian localization problem are reported. The experiments show that the solution is accurate and that it fully benefits from the scalability of the 2D search space granularity provided by our representation.