Optic disc and cup segmentation play a crucial role in automating the screening and diagnosis of optic glaucoma. While data-driven convolutional neural networks (CNNs) show promise in this area, the inherent ambiguity of segmenting object and background boundaries in the task of optic disc and cup segmentation leads to noisy annotations that impact model performance. To address this, we propose an innovative label-denoising method of Multiple Pseudo-labels Noise-aware Network (MPNN) for accurate optic disc and cup segmentation. Specifically, the Multiple Pseudo-labels Generation and Guided Denoising (MPGGD) module generates pseudo-labels by multiple different initialization networks trained on true labels, and the pixel-level consensus information extracted from these pseudo-labels guides to differentiate clean pixels from noisy pixels. The training framework of the MPNN is constructed by a teacher-student architecture to learn segmentation from clean pixels and noisy pixels. Particularly, such a framework adeptly leverages (i) reliable and fundamental insights from clean pixels and (ii) the supplementary knowledge within noisy pixels via multiple perturbation-based unsupervised consistency. Compared to other label-denoising methods, comprehensive experimental results on the RIGA dataset demonstrate our method's excellent performance and significant denoising ability.
Retinal disease is one of the primary causes of visual impairment, and early diagnosis is essential for preventing further deterioration. Nowadays, many works have explored Transformers for diagnosing diseases due to their strong visual representation capabilities. However, retinal diseases exhibit milder forms and often present with overlapping signs, which pose great difficulties for accurate multi-class classification. Therefore, we propose a new framework named Multi-Scale Patch Message Passing Swin Transformer for multi-class retinal disease classification. Specifically, we design a Patch Message Passing (PMP) module based on the Message Passing mechanism to establish global interaction for pathological semantic features and to exploit the subtle differences further between different diseases. Moreover, considering the various scale of pathological features we integrate multiple PMP modules for different patch sizes. For evaluation, we have constructed a new dataset, named OPTOS dataset, consisting of 1,033 high-resolution fundus images photographed by Optos camera and conducted comprehensive experiments to validate the efficacy of our proposed method. And the results on both the public dataset and our dataset demonstrate that our method achieves remarkable performance compared to state-of-the-art methods.
Medical image segmentation annotations exhibit variations among experts due to the ambiguous boundaries of segmented objects and backgrounds in medical images. Although using multiple annotations for each image in the fully-supervised has been extensively studied for training deep models, obtaining a large amount of multi-annotated data is challenging due to the substantial time and manpower costs required for segmentation annotations, resulting in most images lacking any annotations. To address this, we propose Multi-annotated Semi-supervised Ensemble Networks (MSE-Nets) for learning segmentation from limited multi-annotated and abundant unannotated data. Specifically, we introduce the Network Pairwise Consistency Enhancement (NPCE) module and Multi-Network Pseudo Supervised (MNPS) module to enhance MSE-Nets for the segmentation task by considering two major factors: (1) to optimize the utilization of all accessible multi-annotated data, the NPCE separates (dis)agreement annotations of multi-annotated data at the pixel level and handles agreement and disagreement annotations in different ways, (2) to mitigate the introduction of imprecise pseudo-labels, the MNPS extends the training data by leveraging consistent pseudo-labels from unannotated data. Finally, we improve confidence calibration by averaging the predictions of base networks. Experiments on the ISIC dataset show that we reduced the demand for multi-annotated data by 97.75\% and narrowed the gap with the best fully-supervised baseline to just a Jaccard index of 4\%. Furthermore, compared to other semi-supervised methods that rely only on a single annotation or a combined fusion approach, the comprehensive experimental results on ISIC and RIGA datasets demonstrate the superior performance of our proposed method in medical image segmentation with ambiguous boundaries.
Low-cost autonomous robots suffer from limited onboard computing power, resulting in excessive computation time when navigating in cluttered environments. This paper presents Edge Accelerated Robot Navigation, or EARN for short, to achieve real-time collision avoidance by adopting hierarchical motion planning (HMP). In contrast to existing local or edge motion planning solutions that ignore the interdependency between low-level motion planning and high-level resource allocation, EARN adopts model predictive switching (MPS) that maximizes the expected switching gain w.r.t. robot states and actions under computation and communication resource constraints. As such, each robot can dynamically switch between a point-mass motion planner executed locally to guarantee safety (e.g., path-following) and a full-shape motion planner executed non-locally to guarantee efficiency (e.g., overtaking). The crux to EARN is a two-time scale integrated decision-planning algorithm based on bilevel mixed-integer optimization, and a fast conditional collision avoidance algorithm based on penalty dual decomposition. We validate the performance of EARN in indoor simulation, outdoor simulation, and real-world environments. Experiments show that EARN achieves significantly smaller navigation time and collision ratios than state-of-the-art navigation approaches.
Recently, multi-modal content generation has attracted lots of attention from researchers by investigating the utilization of visual instruction tuning based on large language models (LLMs). To enhance the performance and generalization ability of such LLMs, the practice of distilling knowledge from pretrained multi-modal models (a.k.a. teachers) to more compact multi-modal LLMs (students) has gained considerable interest. However, the prevailing paradigm of instructiontuning in multi-modal LLMs knowledge distillation is resource-intensive and unidirectional, neglecting the potential for mutual feedback between the student and teacher models. Thus, we propose an innovative Competitive Multi-modal Distillation framework (CoMD), which captures bidirectional feedback between teacher and student models and continually updates the multi-modal capabilities that the student model has learned. It comprises two stages: multi-modal pre-training and multi-modal competitive distillation. The first stage pre-trains the student model on a large number of filtered multi-modal datasets. The second stage facilitates a bidirectional knowledge transfer between the student and teacher models. Our experimental analysis of diverse datasets shows that our knowledge transfer method consistently improves the capabilities of the student model. Finally, the 7B-sized student model after four distillations surpassed the current state-of-the-art model LLaVA-13B on the ScienceQA and LLaVA Test dataset, also outperforms other strong baselines in the zero-shot setting.
In this paper, a computer-vision-assisted simulation method is proposed to address the issue of training dataset acquisition for wireless hand gesture recognition. In the existing literature, in order to classify gestures via the wireless channel estimation, massive training samples should be measured in a consistent environment, consuming significant efforts. In the proposed CASTER simulator, however, the training dataset can be simulated via existing videos. Particularly, a gesture is represented by a sequence of snapshots, and the channel impulse response of each snapshot is calculated via tracing the rays scattered off a primitive-based hand model. Moreover, CASTER simulator relies on the existing videos to extract the motion data of gestures. Thus, the massive measurements of wireless channel can be eliminated. The experiments demonstrate a 90.8% average classification accuracy of simulation-to-reality inference.
This letter investigates the secure delivery performance of the rate-splitting multiple access scheme in land mobile satellite (LMS) systems, considering that the private messages intended by a terminal can be eavesdropped by any others from the broadcast signals. Specifically, the considered system has an N-antenna satellite and numerous single-antenna land users. Maximum ratio transmission (MRT) and matched-filtering (MF) precoding techniques are adopted at the satellite separately for the common messages (CMs) and for the private messages (PMs), which are both implemented based on the estimated LMS channels suffering from the Shadowed-Rician fading. Then, closed-form expressions are derived for the ergodic rates for decoding the CM, and for decoding the PM at the intended user respectively, and more importantly, we also derive the ergodic secrecy rate against eavesdropping. Finally, numerical results are provided to validate the correctness of the proposed analysis models, as well as to show some interesting comparisons.
Precise Tooth Cone Beam Computed Tomography (CBCT) image segmentation is crucial for orthodontic treatment planning. In this paper, we propose FDNet, a Feature Decoupled Segmentation Network, to excel in the face of the variable dental conditions encountered in CBCT scans, such as complex artifacts and indistinct tooth boundaries. The Low-Frequency Wavelet Transform (LF-Wavelet) is employed to enrich the semantic content by emphasizing the global structural integrity of the teeth, while the SAM encoder is leveraged to refine the boundary delineation, thus improving the contrast between adjacent dental structures. By integrating these dual aspects, FDNet adeptly addresses the semantic gap, providing a detailed and accurate segmentation. The framework's effectiveness is validated through rigorous benchmarks, achieving the top Dice and IoU scores of 85.28% and 75.23%, respectively. This innovative decoupling of semantic and boundary features capitalizes on the unique strengths of each element to significantly elevate the quality of segmentation performance.
Recent advances in large language models have enabled the development of viable generative information retrieval systems. A generative retrieval system returns a grounded generated text in response to an information need instead of the traditional document ranking. Quantifying the utility of these types of responses is essential for evaluating generative retrieval systems. As the established evaluation methodology for ranking-based ad hoc retrieval may seem unsuitable for generative retrieval, new approaches for reliable, repeatable, and reproducible experimentation are required. In this paper, we survey the relevant information retrieval and natural language processing literature, identify search tasks and system architectures in generative retrieval, develop a corresponding user model, and study its operationalization. This theoretical analysis provides a foundation and new insights for the evaluation of generative ad hoc retrieval systems.
Federated learning (FL) has been recognized as a rapidly growing research area, where the model is trained over massively distributed clients under the orchestration of a parameter server (PS) without sharing clients' data. This paper delves into a class of federated problems characterized by non-convex and non-smooth loss functions, that are prevalent in FL applications but challenging to handle due to their intricate non-convexity and non-smoothness nature and the conflicting requirements on communication efficiency and privacy protection. In this paper, we propose a novel federated primal-dual algorithm with bidirectional model sparsification tailored for non-convex and non-smooth FL problems, and differential privacy is applied for strong privacy guarantee. Its unique insightful properties and some privacy and convergence analyses are also presented for the FL algorithm design guidelines. Extensive experiments on real-world data are conducted to demonstrate the effectiveness of the proposed algorithm and much superior performance than some state-of-the-art FL algorithms, together with the validation of all the analytical results and properties.