We describe a new general method for segmentation in MRI scans using Topological Data Analysis (TDA), offering several advantages over traditional machine learning approaches. It works in three steps, first identifying the whole object to segment via automatic thresholding, then detecting a distinctive subset whose topology is known in advance, and finally deducing the various components of the segmentation. Although convoking classical ideas of TDA, such an algorithm has never been proposed separately from deep learning methods. To achieve this, our approach takes into account, in addition to the homology of the image, the localization of representative cycles, a piece of information that seems never to have been exploited in this context. In particular, it offers the ability to perform segmentation without the need for large annotated data sets. TDA also provides a more interpretable and stable framework for segmentation by explicitly mapping topological features to segmentation components. By adapting the geometric object to be detected, the algorithm can be adjusted to a wide range of data segmentation challenges. We carefully study the examples of glioblastoma segmentation in brain MRI, where a sphere is to be detected, as well as myocardium in cardiac MRI, involving a cylinder, and cortical plate detection in fetal brain MRI, whose 2D slices are circles. We compare our method to state-of-the-art algorithms.
The electromagnetic inverse problem has long been a research hotspot. This study aims to reverse radar view angles in synthetic aperture radar (SAR) images given a target model. Nonetheless, the scarcity of SAR data, combined with the intricate background interference and imaging mechanisms, limit the applications of existing learning-based approaches. To address these challenges, we propose an interactive deep reinforcement learning (DRL) framework, where an electromagnetic simulator named differentiable SAR render (DSR) is embedded to facilitate the interaction between the agent and the environment, simulating a human-like process of angle prediction. Specifically, DSR generates SAR images at arbitrary view angles in real-time. And the differences in sequential and semantic aspects between the view angle-corresponding images are leveraged to construct the state space in DRL, which effectively suppress the complex background interference, enhance the sensitivity to temporal variations, and improve the capability to capture fine-grained information. Additionally, in order to maintain the stability and convergence of our method, a series of reward mechanisms, such as memory difference, smoothing and boundary penalty, are utilized to form the final reward function. Extensive experiments performed on both simulated and real datasets demonstrate the effectiveness and robustness of our proposed method. When utilized in the cross-domain area, the proposed method greatly mitigates inconsistency between simulated and real domains, outperforming reference methods significantly.
In robust optimization problems, the magnitude of perturbations is relatively small. Consequently, solutions within certain regions are less likely to represent the robust optima when perturbations are introduced. Hence, a more efficient search process would benefit from increased opportunities to explore promising regions where global optima or good local optima are situated. In this paper, we introduce a novel robust evolutionary algorithm named the dual-stage robust evolutionary algorithm (DREA) aimed at discovering robust solutions. DREA operates in two stages: the peak-detection stage and the robust solution-searching stage. The primary objective of the peak-detection stage is to identify peaks in the fitness landscape of the original optimization problem. Conversely, the robust solution-searching stage focuses on swiftly identifying the robust optimal solution using information obtained from the peaks discovered in the initial stage. These two stages collectively enable the proposed DREA to efficiently obtain the robust optimal solution for the optimization problem. This approach achieves a balance between solution optimality and robustness by separating the search processes for optimal and robust optimal solutions. Experimental results demonstrate that DREA significantly outperforms five state-of-the-art algorithms across 18 test problems characterized by diverse complexities. Moreover, when evaluated on higher-dimensional robust optimization problems (100-$D$ and 200-$D$), DREA also demonstrates superior performance compared to all five counterpart algorithms.
Relative pose estimation for RGBD cameras is crucial in a number of applications. Previous approaches either rely on the RGB aspect of the images to estimate pose thus not fully making use of depth in the estimation process or estimate pose from the 3D cloud of points that each image produces, thus not making full use of RGB information. This paper shows that if one pair of correspondences is hypothesized from the RGB-based ranked-ordered correspondence list, then the space of remaining correspondences is restricted to corresponding pairs of curves nested around the hypothesized correspondence, implicitly capturing depth consistency. This simple Geometric Depth Constraint (GDC) significantly reduces potential matches. In effect this becomes a filter on possible correspondences that helps reduce the number of outliers and thus expedites RANSAC significantly. As such, the same budget of time allows for more RANSAC iterations and therefore additional robustness and a significant speedup. In addition, the paper proposed a Nested RANSAC approach that also speeds up the process, as shown through experiments on TUM, ICL-NUIM, and RGBD Scenes v2 datasets.
Recent releases of pre-trained Large Language Models (LLMs) have gained considerable traction, yet research on fine-tuning and employing domain-specific LLMs remains scarce. This study investigates approaches for fine-tuning and leveraging domain-specific LLMs, highlighting trends in LLMs, foundational models, and methods for domain-specific pre-training. Focusing on the financial sector, it details dataset selection, preprocessing, model choice, and considerations crucial for LLM fine-tuning in finance. Addressing the unique characteristics of financial data, the study explores the construction of domain-specific vocabularies and considerations for security and regulatory compliance. In the practical application of LLM fine-tuning, the study outlines the procedure and implementation for generating domain-specific LLMs in finance. Various financial cases, including stock price prediction, sentiment analysis of financial news, automated document processing, research, information extraction, and customer service enhancement, are exemplified. The study explores the potential of LLMs in the financial domain, identifies limitations, and proposes directions for improvement, contributing valuable insights for future research. Ultimately, it advances natural language processing technology in business, suggesting proactive LLM utilization in financial services across industries.
Recently, Visual Transformer (ViT) has been extensively used in medical image segmentation (MIS) due to applying self-attention mechanism in the spatial domain to modeling global knowledge. However, many studies have focused on improving models in the spatial domain while neglecting the importance of frequency domain information. Therefore, we propose Multi-axis External Weights UNet (MEW-UNet) based on the U-shape architecture by replacing self-attention in ViT with our Multi-axis External Weights block. Specifically, our block performs a Fourier transform on the three axes of the input features and assigns the external weight in the frequency domain, which is generated by our External Weights Generator. Then, an inverse Fourier transform is performed to change the features back to the spatial domain. We evaluate our model on four datasets, including Synapse, ACDC, ISIC17 and ISIC18 datasets, and our approach demonstrates competitive performance, owing to its effective utilization of frequency domain information.
Recent work in Natural Language Processing and Computer Vision has been using textual information -- e.g., entity names and descriptions -- available in knowledge graphs to ground neural models to high-quality structured data. However, when it comes to non-English languages, the quantity and quality of textual information are comparatively scarce. To address this issue, we introduce the novel task of automatic Knowledge Graph Enhancement (KGE) and perform a thorough investigation on bridging the gap in both the quantity and quality of textual information between English and non-English languages. More specifically, we: i) bring to light the problem of increasing multilingual coverage and precision of entity names and descriptions in Wikidata; ii) demonstrate that state-of-the-art methods, namely, Machine Translation (MT), Web Search (WS), and Large Language Models (LLMs), struggle with this task; iii) present M-NTA, a novel unsupervised approach that combines MT, WS, and LLMs to generate high-quality textual information; and, iv) study the impact of increasing multilingual coverage and precision of non-English textual information in Entity Linking, Knowledge Graph Completion, and Question Answering. As part of our effort towards better multilingual knowledge graphs, we also introduce WikiKGE-10, the first human-curated benchmark to evaluate KGE approaches in 10 languages across 7 language families.
Pulmonary embolism (PE) is a prevalent lung disease that can lead to right ventricular hypertrophy and failure in severe cases, ranking second in severity only to myocardial infarction and sudden death. Pulmonary artery CT angiography (CTPA) is a widely used diagnostic method for PE. However, PE detection presents challenges in clinical practice due to limitations in imaging technology. CTPA can produce noises similar to PE, making confirmation of its presence time-consuming and prone to overdiagnosis. Nevertheless, the traditional segmentation method of PE can not fully consider the hierarchical structure of features, local and global spatial features of PE CT images. In this paper, we propose an automatic PE segmentation method called SCUNet++ (Swin Conv UNet++). This method incorporates multiple fusion dense skip connections between the encoder and decoder, utilizing the Swin Transformer as the encoder. And fuses features of different scales in the decoder subnetwork to compensate for spatial information loss caused by the inevitable downsampling in Swin-UNet or other state-of-the-art methods, effectively solving the above problem. We provide a theoretical analysis of this method in detail and validate it on publicly available PE CT image datasets FUMPE and CAD-PE. The experimental results indicate that our proposed method achieved a Dice similarity coefficient (DSC) of 83.47% and a Hausdorff distance 95th percentile (HD95) of 3.83 on the FUMPE dataset, as well as a DSC of 83.42% and an HD95 of 5.10 on the CAD-PE dataset. These findings demonstrate that our method exhibits strong performance in PE segmentation tasks, potentially enhancing the accuracy of automatic segmentation of PE and providing a powerful diagnostic tool for clinical physicians. Our source code and new FUMPE dataset are available at https://github.com/JustlfC03/SCUNet-plusplus.
JPEG is a widely used compression scheme to efficiently reduce the volume of transmitted images. The artifacts appear among blocks due to the information loss, which not only affects the quality of images but also harms the subsequent high-level tasks in terms of feature drifting. High-level vision models trained on high-quality images will suffer performance degradation when dealing with compressed images, especially on mobile devices. Numerous learning-based JPEG artifact removal methods have been proposed to handle visual artifacts. However, it is not an ideal choice to use these JPEG artifact removal methods as a pre-processing for compressed image classification for the following reasons: 1. These methods are designed for human vision rather than high-level vision models; 2. These methods are not efficient enough to serve as pre-processing on resource-constrained devices. To address these issues, this paper proposes a novel lightweight AFD module to boost the performance of pre-trained image classification models when facing compressed images. First, a FDE-Net is devised to generate the spatial-wise FDM in the DCT domain. Next, the estimated FDM is transmitted to the FE-Net to generate the mapping relationship between degraded features and corresponding high-quality features. A simple but effective RepConv block equipped with structural re-parameterization is utilized in FE-Net, which enriches feature representation in the training phase while maintaining efficiency in the deployment phase. After training on limited compressed images, the AFD-Module can serve as a "plug-and-play" model for pre-trained classification models to improve their performance on compressed images. Experiments demonstrate that our proposed AFD module can comprehensively improve the accuracy of the pre-trained classification models and significantly outperform the existing methods.
The probability of an event is in the range of [0, 1]. In a sample space S, the value of probability determines whether an outcome is true or false. The probability of an event Pr(A) that will never occur = 0. The probability of the event Pr(B) that will certainly occur = 1. This makes both events A and B thus a certainty. Furthermore, the sum of probabilities Pr(E1) + Pr(E2) + ... + Pr(En) of a finite set of events in a given sample space S = 1. Conversely, the difference of the sum of two probabilities that will certainly occur is 0. Firstly, this paper discusses Bayes' theorem, then complement of probability and the difference of probability for occurrences of learning-events, before applying these in the prediction of learning objects in student learning. Given the sum total of 1; to make recommendation for student learning, this paper submits that the difference of argMaxPr(S) and probability of student-performance quantifies the weight of learning objects for students. Using a dataset of skill-set, the computational procedure demonstrates: i) the probability of skill-set events that has occurred that would lead to higher level learning; ii) the probability of the events that has not occurred that requires subject-matter relearning; iii) accuracy of decision tree in the prediction of student performance into class labels; and iv) information entropy about skill-set data and its implication on student cognitive performance and recommendation of learning [1].