Abstract:Detecting retinal image analysis, particularly the geometrical features of branching points, plays an essential role in diagnosing eye diseases. However, existing methods used for this purpose often are coarse-level and lack fine-grained analysis for efficient annotation. To mitigate these issues, this paper proposes a novel method for detecting retinal branching angles using a self-configured image processing technique. Additionally, we offer an open-source annotation tool and a benchmark dataset comprising 40 images annotated with retinal branching angles. Our methodology for retinal branching angle detection and calculation is detailed, followed by a benchmark analysis comparing our method with previous approaches. The results indicate that our method is robust under various conditions with high accuracy and efficiency, which offers a valuable instrument for ophthalmic research and clinical applications.
Abstract:Multiple instance learning (MIL) stands as a powerful approach in weakly supervised learning, regularly employed in histological whole slide image (WSI) classification for detecting tumorous lesions. However, existing mainstream MIL methods focus on modeling correlation between instances while overlooking the inherent diversity among instances. However, few MIL methods have aimed at diversity modeling, which empirically show inferior performance but with a high computational cost. To bridge this gap, we propose a novel MIL aggregation method based on diverse global representation (DGR-MIL), by modeling diversity among instances through a set of global vectors that serve as a summary of all instances. First, we turn the instance correlation into the similarity between instance embeddings and the predefined global vectors through a cross-attention mechanism. This stems from the fact that similar instance embeddings typically would result in a higher correlation with a certain global vector. Second, we propose two mechanisms to enforce the diversity among the global vectors to be more descriptive of the entire bag: (i) positive instance alignment and (ii) a novel, efficient, and theoretically guaranteed diversification learning paradigm. Specifically, the positive instance alignment module encourages the global vectors to align with the center of positive instances (e.g., instances containing tumors in WSI). To further diversify the global representations, we propose a novel diversification learning paradigm leveraging the determinantal point process. The proposed model outperforms the state-of-the-art MIL aggregation models by a substantial margin on the CAMELYON-16 and the TCGA-lung cancer datasets. The code is available at \url{https://github.com/ChongQingNoSubway/DGR-MIL}.
Abstract:Since its introduction, UNet has been leading a variety of medical image segmentation tasks. Although numerous follow-up studies have also been dedicated to improving the performance of standard UNet, few have conducted in-depth analyses of the underlying interest pattern of UNet in medical image segmentation. In this paper, we explore the patterns learned in a UNet and observe two important factors that potentially affect its performance: (i) irrelative feature learned caused by asymmetric supervision; (ii) feature redundancy in the feature map. To this end, we propose to balance the supervision between encoder and decoder and reduce the redundant information in the UNet. Specifically, we use the feature map that contains the most semantic information (i.e., the last layer of the decoder) to provide additional supervision to other blocks to provide additional supervision and reduce feature redundancy by leveraging feature distillation. The proposed method can be easily integrated into existing UNet architecture in a plug-and-play fashion with negligible computational cost. The experimental results suggest that the proposed method consistently improves the performance of standard UNets on four medical image segmentation datasets. The code is available at \url{https://github.com/ChongQingNoSubway/SelfReg-UNet}
Abstract:Deep neural networks, including transformers and convolutional neural networks, have significantly improved multivariate time series classification (MTSC). However, these methods often rely on supervised learning, which does not fully account for the sparsity and locality of patterns in time series data (e.g., diseases-related anomalous points in ECG). To address this challenge, we formally reformulate MTSC as a weakly supervised problem, introducing a novel multiple-instance learning (MIL) framework for better localization of patterns of interest and modeling time dependencies within time series. Our novel approach, TimeMIL, formulates the temporal correlation and ordering within a time-aware MIL pooling, leveraging a tokenized transformer with a specialized learnable wavelet positional token. The proposed method surpassed 26 recent state-of-the-art methods, underscoring the effectiveness of the weakly supervised TimeMIL in MTSC.
Abstract:Inverse imaging problems (IIPs) arise in various applications, with the main objective of reconstructing an image from its compressed measurements. This problem is often ill-posed for being under-determined with multiple interchangeably consistent solutions. The best solution inherently depends on prior knowledge or assumptions, such as the sparsity of the image. Furthermore, the reconstruction process for most IIPs relies significantly on the imaging (i.e. forward model) parameters, which might not be fully known, or the measurement device may undergo calibration drifts. These uncertainties in the forward model create substantial challenges, where inaccurate reconstructions usually happen when the postulated parameters of the forward model do not fully match the actual ones. In this work, we devoted to tackling accurate reconstruction under the context of a set of possible forward model parameters that exist. Here, we propose a novel Moment-Aggregation (MA) framework that is compatible with the popular IIP solution by using a neural network prior. Specifically, our method can reconstruct the signal by considering all candidate parameters of the forward model simultaneously during the update of the neural network. We theoretically demonstrate the convergence of the MA framework, which has a similar complexity with reconstruction under the known forward model parameters. Proof-of-concept experiments demonstrate that the proposed MA achieves performance comparable to the forward model with the known precise parameter in reconstruction across both compressive sensing and phase retrieval applications, with a PSNR gap of 0.17 to 1.94 over various datasets, including MNIST, X-ray, Glas, and MoNuseg. This highlights our method's significant potential in reconstruction under an uncertain forward model.
Abstract:Motion analysis plays a critical role in various applications, from virtual reality and augmented reality to assistive visual navigation. Traditional self-driving technologies, while advanced, typically do not translate directly to pedestrian applications due to their reliance on extensive sensor arrays and non-feasible computational frameworks. This highlights a significant gap in applying these solutions to human users since human navigation introduces unique challenges, including the unpredictable nature of human movement, limited processing capabilities of portable devices, and the need for directional responsiveness due to the limited perception range of humans. In this project, we introduce an image-only method that applies motion analysis using optical flow with ego-motion compensation to predict Motor Focus-where and how humans or machines focus their movement intentions. Meanwhile, this paper addresses the camera shaking issue in handheld and body-mounted devices which can severely degrade performance and accuracy, by applying a Gaussian aggregation to stabilize the predicted motor focus area and enhance the prediction accuracy of movement direction. This also provides a robust, real-time solution that adapts to the user's immediate environment. Furthermore, in the experiments part, we show the qualitative analysis of motor focus estimation between the conventional dense optical flow-based method and the proposed method. In quantitative tests, we show the performance of the proposed method on a collected small dataset that is specialized for motor focus estimation tasks.
Abstract:Sharing and joint processing of camera feeds and sensor measurements, known as Cooperative Perception (CP), has emerged as a new technique to achieve higher perception qualities. CP can enhance the safety of Autonomous Vehicles (AVs) where their individual visual perception quality is compromised by adverse weather conditions (haze as foggy weather), low illumination, winding roads, and crowded traffic. To cover the limitations of former methods, in this paper, we propose a novel approach to realize an optimized CP under constrained communications. At the core of our approach is recruiting the best helper from the available list of front vehicles to augment the visual range and enhance the Object Detection (OD) accuracy of the ego vehicle. In this two-step process, we first select the helper vehicles that contribute the most to CP based on their visual range and lowest motion blur. Next, we implement a radio block optimization among the candidate vehicles to further improve communication efficiency. We specifically focus on pedestrian detection as an exemplary scenario. To validate our approach, we used the CARLA simulator to create a dataset of annotated videos for different driving scenarios where pedestrian detection is challenging for an AV with compromised vision. Our results demonstrate the efficacy of our two-step optimization process in improving the overall performance of cooperative perception in challenging scenarios, substantially improving driving safety under adverse conditions. Finally, we note that the networking assumptions are adopted from LTE Release 14 Mode 4 side-link communication, commonly used for Vehicle-to-Vehicle (V2V) communication. Nonetheless, our method is flexible and applicable to arbitrary V2V communications.
Abstract:This paper explores the potential of Large Language Models(LLMs) in zero-shot anomaly detection for safe visual navigation. With the assistance of the state-of-the-art real-time open-world object detection model Yolo-World and specialized prompts, the proposed framework can identify anomalies within camera-captured frames that include any possible obstacles, then generate concise, audio-delivered descriptions emphasizing abnormalities, assist in safe visual navigation in complex circumstances. Moreover, our proposed framework leverages the advantages of LLMs and the open-vocabulary object detection model to achieve the dynamic scenario switch, which allows users to transition smoothly from scene to scene, which addresses the limitation of traditional visual navigation. Furthermore, this paper explored the performance contribution of different prompt components, provided the vision for future improvement in visual accessibility, and paved the way for LLMs in video anomaly detection and vision-language understanding.
Abstract:The rise of machine learning in recent years has brought benefits to various research fields such as wide fire detection. Nevertheless, small object detection and rare object detection remain a challenge. To address this problem, we present a dataset automata that can generate ground truth paired datasets using diffusion models. Specifically, we introduce a mask-guided diffusion framework that can fusion the wildfire into the existing images while the flame position and size can be precisely controlled. In advance, to fill the gap that the dataset of wildfire images in specific scenarios is missing, we vary the background of synthesized images by controlling both the text prompt and input image. Furthermore, to solve the color tint problem or the well-known domain shift issue, we apply the CLIP model to filter the generated massive dataset to preserve quality. Thus, our proposed framework can generate a massive dataset of that images are high-quality and ground truth-paired, which well addresses the needs of the annotated datasets in specific tasks.
Abstract:This paper presents a comprehensive literature review of the current state of in-vehicle human-computer interaction (HCI) in the context of self-driving vehicles, with a specific focus on inclusion and accessibility. This study's aim is to examine the user-centered design principles for inclusive HCI in self-driving vehicles, evaluate existing HCI systems, and identify emerging technologies that have the potential to enhance the passenger experience. The paper begins by providing an overview of the current state of self-driving vehicle technology, followed by an examination of the importance of HCI in this context. Next, the paper reviews the existing literature on inclusive HCI design principles and evaluates the effectiveness of current HCI systems in self-driving vehicles. The paper also identifies emerging technologies that have the potential to enhance the passenger experience, such as voice-activated interfaces, haptic feedback systems, and augmented reality displays. Finally, the paper proposes an end-to-end design framework for the development of an inclusive in-vehicle experience, which takes into consideration the needs of all passengers, including those with disabilities, or other accessibility requirements. This literature review highlights the importance of user-centered design principles in the development of HCI systems for self-driving vehicles and emphasizes the need for inclusive design to ensure that all passengers can safely and comfortably use these vehicles. The proposed end-to-end design framework provides a practical approach to achieving this goal and can serve as a valuable resource for designers, researchers, and policymakers in this field.