Abstract:We aim to investigate the impact of image and signal properties on visual attention mechanisms during a signal detection task in digital images. The application of insight yielded from this work spans many areas of digital imaging where signal or pattern recognition is involved in complex heterogenous background. We used simulated tomographic breast images as the platform to investigate this question. While radiologists are highly effective at analyzing medical images to detect and diagnose diseases, misdiagnosis still occurs. We selected digital breast tomosynthesis (DBT) images as a sample medical images with different breast densities and structures using digital breast phantoms (Bakic and XCAT). Two types of lesions (with distinct spatial frequency properties) were randomly inserted in the phantoms during projections to generate abnormal cases. Six human observers participated in observer study designed for a locating and detection of an 3-mm sphere lesion and 6-mm spicule lesion in reconstructed in-plane DBT slices. We collected eye-gaze data to estimate gaze metrics and to examine differences in visual attention mechanisms. We found that detection performance in complex visual environments is strongly constrained by later perceptual stages, with decision failures accounting for the largest proportion of errors. Signal detectability is jointly influenced by both target morphology and background complexity, revealing a critical interaction between local signal features and global anatomical noise. Increased fixation duration on spiculated lesions suggests that visual attention is differentially engaged depending on background and signal spatial frequency dependencies.
Abstract:This study advances task-based image quality assessment by developing an anthropomorphic thresholded visual-search model observer. The model is an ideal observer for thresholded data inspired by the human visual system, allowing selective processing of high-salience features to improve discrimination performance. By filtering out irrelevant variability, the model enhances diagnostic accuracy and computational efficiency. The observer employs a two-stage framework: candidate selection and decision-making. Using thresholded data during candidate selection refines regions of interest, while stage-specific feature processing optimizes performance. Simulations were conducted to evaluate the effects of thresholding on feature maps, candidate localization, and multi-feature scenarios. Results demonstrate that thresholding improves observer performance by excluding low-salience features, particularly in noisy environments. Intermediate thresholds often outperform no thresholding, indicating that retaining only relevant features is more effective than keeping all features. Additionally, the model demonstrates effective training with fewer images while maintaining alignment with human performance. These findings suggest that the proposed novel framework can predict human visual search performance in clinically realistic tasks and provide solutions for model observer training with limited resources. Our novel approach has applications in other areas where human visual search and detection tasks are modeled such as in computer vision, machine learning, defense and security image analysis.
Abstract:Understanding human visual search behavior is a fundamental problem in vision science and computer vision, with direct implications for modeling how observers allocate attention in location-unknown search tasks. In this study, we investigate the relationship between Gabor-based features and gray-level co-occurrence matrix (GLCM) based texture features in modeling early-stage visual search behavior. Two feature-combination pipelines are proposed to integrate Gabor and GLCM features for narrowing the region of possible human fixations. The pipelines are evaluated using simulated digital breast tomosynthesis images. Results show qualitative agreement among fixation candidates predicted by the proposed pipelines and a threshold-based model observer. A strong correlation is observed between GLCM mean and Gabor feature responses, indicating that these features encode related image information despite their different formulations. Eye-tracking data from human observers further suggest consistency between predicted fixation regions and early-stage gaze behavior. These findings highlight the value of combining structural and texture-based features for modeling visual search and support the development of perceptually informed observer models.
Abstract:We develop a new statistical ideal observer model that performs holistic visual search (or gist) processing in part by placing thresholds on minimum extractable image features. In this model, the ideal observer reduces the number of free parameters thereby shrinking down the system. The applications of this novel framework is in medical image perception (for optimizing imaging systems and algorithms), computer vision, benchmarking performance and enabling feature selection/evaluations. Other applications are in target detection and recognition in defense/security as well as evaluating sensors and detectors.