Institute of Remote Sensing and Geographic Information System, School of Earth and Space Sciences, Peking University
Abstract:Mixture-of-Experts (MoE) architectures have recently gained increasing popularity within the domain of large language models (LLMs) due to their ability to significantly reduce training and inference overhead. However, MoE architectures face challenges, such as significant disparities in the number of tokens assigned to each expert and a tendency toward homogenization among experts, which adversely affects the model's semantic generation capabilities. In this paper, we introduce LocMoE+, a refined version of the low-overhead LocMoE, incorporating the following enhancements: (1) Quantification and definition of the affinity between experts and tokens. (2) Implementation of a global-level adaptive routing strategy to rearrange tokens based on their affinity scores. (3) Reestimation of the lower bound for expert capacity, which has been shown to progressively decrease as the token feature distribution evolves. Experimental results demonstrate that, without compromising model convergence or efficacy, the number of tokens each expert processes can be reduced by over 60%. Combined with communication optimizations, this leads to an average improvement in training efficiency ranging from 5.4% to 46.6%. After fine-tuning, LocMoE+ exhibits a performance improvement of 9.7% to 14.1% across the GDAD, C-Eval, and TeleQnA datasets.
Abstract:Multiple instance learning (MIL) has emerged as a popular method for classifying histopathology whole slide images (WSIs). Existing approaches typically rely on frozen pre-trained models to extract instance features, neglecting the substantial domain shift between pre-training natural and histopathological images. To address this issue, we propose PAMT, a novel Prompt-guided Adaptive Model Transformation framework that enhances MIL classification performance by seamlessly adapting pre-trained models to the specific characteristics of histopathology data. To capture the intricate histopathology distribution, we introduce Representative Patch Sampling (RPS) and Prototypical Visual Prompt (PVP) to reform the input data, building a compact while informative representation. Furthermore, to narrow the domain gap, we introduce Adaptive Model Transformation (AMT) that integrates adapter blocks within the feature extraction pipeline, enabling the pre-trained models to learn domain-specific features. We rigorously evaluate our approach on two publicly available datasets, Camelyon16 and TCGA-NSCLC, showcasing substantial improvements across various MIL models. Our findings affirm the potential of PAMT to set a new benchmark in WSI classification, underscoring the value of a targeted reprogramming approach.
Abstract:Accurate and robust classification of diseases is important for proper diagnosis and treatment. However, medical datasets often face challenges related to limited sample sizes and inherent imbalanced distributions, due to difficulties in data collection and variations in disease prevalence across different types. In this paper, we introduce an Iterative Online Image Synthesis (IOIS) framework to address the class imbalance problem in medical image classification. Our framework incorporates two key modules, namely Online Image Synthesis (OIS) and Accuracy Adaptive Sampling (AAS), which collectively target the imbalance classification issue at both the instance level and the class level. The OIS module alleviates the data insufficiency problem by generating representative samples tailored for online training of the classifier. On the other hand, the AAS module dynamically balances the synthesized samples among various classes, targeting those with low training accuracy. To evaluate the effectiveness of our proposed method in addressing imbalanced classification, we conduct experiments on the HAM10000 and APTOS datasets. The results obtained demonstrate the superiority of our approach over state-of-the-art methods as well as the effectiveness of each component. The source code will be released upon acceptance.
Abstract:The Mixtures-of-Experts (MoE) model is a widespread distributed and integrated learning method for large language models (LLM), which is favored due to its ability to sparsify and expand models efficiently. However, the performance of MoE is limited by load imbalance and high latency of All-To-All communication, along with relatively redundant computation owing to large expert capacity. Load imbalance may result from existing routing policies that consistently tend to select certain experts. The frequent inter-node communication in the All-To-All procedure also significantly prolongs the training time. To alleviate the above performance problems, we propose a novel routing strategy that combines load balance and locality by converting partial inter-node communication to that of intra-node. Notably, we elucidate that there is a minimum threshold for expert capacity, calculated through the maximal angular deviation between the gating weights of the experts and the assigned tokens. We port these modifications on the PanGu-Sigma model based on the MindSpore framework with multi-level routing and conduct experiments on Ascend clusters. The experiment results demonstrate that the proposed LocMoE reduces training time per epoch by 12.68% to 22.24% compared to classical routers, such as hash router and switch router, without impacting the model accuracy.
Abstract:Nuclei segmentation is a fundamental prerequisite in the digital pathology workflow. The development of automated methods for nuclei segmentation enables quantitative analysis of the wide existence and large variances in nuclei morphometry in histopathology images. However, manual annotation of tens of thousands of nuclei is tedious and time-consuming, which requires significant amount of human effort and domain-specific expertise. To alleviate this problem, in this paper, we propose a weakly-supervised nuclei segmentation method that only requires partial point labels of nuclei. Specifically, we propose a novel boundary mining framework for nuclei segmentation, named BoNuS, which simultaneously learns nuclei interior and boundary information from the point labels. To achieve this goal, we propose a novel boundary mining loss, which guides the model to learn the boundary information by exploring the pairwise pixel affinity in a multiple-instance learning manner. Then, we consider a more challenging problem, i.e., partial point label, where we propose a nuclei detection module with curriculum learning to detect the missing nuclei with prior morphological knowledge. The proposed method is validated on three public datasets, MoNuSeg, CPM, and CoNIC datasets. Experimental results demonstrate the superior performance of our method to the state-of-the-art weakly-supervised nuclei segmentation methods. Code: https://github.com/hust-linyi/bonus.
Abstract:Perching on the moving platforms is a promising solution to enhance the endurance and operational range of quadrotors, which could benefit the efficiency of a variety of air-ground cooperative tasks. To ensure robust perching, tracking with a steady relative state and reliable perception is a prerequisite. This paper presents an adaptive dynamic tracking and perching scheme for autonomous quadrotors to achieve tight integration with moving platforms. For reliable perception of dynamic targets, we introduce elastic visibility-aware planning to actively avoid occlusion and target loss. Additionally, we propose a flexible terminal adjustment method that adapts the changes in flight duration and the coupled terminal states, ensuring full-state synchronization with the time-varying perching surface at various angles. A relaxation strategy is developed by optimizing the tangential relative speed to address the dynamics and safety violations brought by hard boundary conditions. Moreover, we take SE(3) motion planning into account to ensure no collision between the quadrotor and the platform until the contact moment. Furthermore, we propose an efficient spatiotemporal trajectory optimization framework considering full state dynamics for tracking and perching. The proposed method is extensively tested through benchmark comparisons and ablation studies. To facilitate the application of academic research to industry and to validate the efficiency of our scheme under strictly limited computational resources, we deploy our system on a commercial drone (DJI-MAVIC3) with a full-size sport-utility vehicle (SUV). We conduct extensive real-world experiments, where the drone successfully tracks and perches at 30~km/h (8.3~m/s) on the top of the SUV, and at 3.5~m/s with 60{\deg} inclined into the trunk of the SUV.
Abstract:Radio speech echo is a specific phenomenon in the air traffic control (ATC) domain, which degrades speech quality and further impacts automatic speech recognition (ASR) accuracy. In this work, a recognition-oriented speech enhancement (ROSE) framework is proposed to improve speech intelligibility and also advance ASR accuracy, which serves as a plug-and-play tool in ATC scenarios and does not require additional retraining of the ASR model. Specifically, an encoder-decoder-based U-Net framework is proposed to eliminate the radio speech echo based on the real-world collected corpus. By incorporating the SE-oriented and ASR-oriented loss, ROSE is implemented in a multi-objective manner by learning shared representations across the two optimization objectives. An attention-based skip-fusion (ABSF) mechanism is applied to skip connections to refine the features. A channel and sequence attention (CSAtt) block is innovatively designed to guide the model to focus on informative representations and suppress disturbing features. The experimental results show that the ROSE significantly outperforms other state-of-the-art methods for both the SE and ASR tasks. In addition, the proposed approach can contribute to the desired performance improvements on public datasets.
Abstract:Deep learning based methods for medical images can be easily compromised by adversarial examples (AEs), posing a great security flaw in clinical decision-making. It has been discovered that conventional adversarial attacks like PGD which optimize the classification logits, are easy to distinguish in the feature space, resulting in accurate reactive defenses. To better understand this phenomenon and reassess the reliability of the reactive defenses for medical AEs, we thoroughly investigate the characteristic of conventional medical AEs. Specifically, we first theoretically prove that conventional adversarial attacks change the outputs by continuously optimizing vulnerable features in a fixed direction, thereby leading to outlier representations in the feature space. Then, a stress test is conducted to reveal the vulnerability of medical images, by comparing with natural images. Interestingly, this vulnerability is a double-edged sword, which can be exploited to hide AEs. We then propose a simple-yet-effective hierarchical feature constraint (HFC), a novel add-on to conventional white-box attacks, which assists to hide the adversarial feature in the target feature distribution. The proposed method is evaluated on three medical datasets, both 2D and 3D, with different modalities. The experimental results demonstrate the superiority of HFC, \emph{i.e.,} it bypasses an array of state-of-the-art adversarial medical AE detectors more efficiently than competing adaptive attacks, which reveals the deficiencies of medical reactive defense and allows to develop more robust defenses in future.
Abstract:Recently, Transformer-based methods have significantly improved state-of-the-art time series forecasting results, but they suffer from high computational costs and the inability to capture the long and short periodicity of time series. We present a highly accurate and simply structured CNN-based model for long-term time series forecasting tasks, called WinNet, including (i) Inter-Intra Period Encoder (I2PE) to transform 1D sequence into 2D tensor with long and short periodicity according to the predefined periodic window, (ii) Two-Dimensional Period Decomposition (TDPD) to model period-trend and oscillation terms, and (iii) Decomposition Correlation Block (DCB) to leverage the correlations of the period-trend and oscillation terms to support the prediction tasks by CNNs. Results on nine benchmark datasets show that the WinNet can achieve SOTA performance and lower computational complexity over CNN-, MLP-, Transformer-based approaches. The WinNet provides potential for the CNN-based methods in the time series forecasting tasks, with perfect tradeoff between performance and efficiency.
Abstract:Myocardial motion tracking stands as an essential clinical tool in the prevention and detection of Cardiovascular Diseases (CVDs), the foremost cause of death globally. However, current techniques suffer incomplete and inaccurate motion estimation of the myocardium both in spatial and temporal dimensions, hindering the early identification of myocardial dysfunction. In addressing these challenges, this paper introduces the Neural Cardiac Motion Field (NeuralCMF). NeuralCMF leverages the implicit neural representation (INR) to model the 3D structure and the comprehensive 6D forward/backward motion of the heart. This approach offers memory-efficient storage and continuous capability to query the precise shape and motion of the myocardium throughout the cardiac cycle at any specific point. Notably, NeuralCMF operates without the need for paired datasets, and its optimization is self-supervised through the physics knowledge priors both in space and time dimensions, ensuring compatibility with both 2D and 3D echocardiogram video inputs. Experimental validations across three representative datasets support the robustness and innovative nature of the NeuralCMF, marking significant advantages over existing state-of-the-arts in cardiac imaging and motion tracking.