Abstract:Medical image segmentation is crucial for computer-aided diagnosis, yet privacy constraints hinder data sharing across institutions. Federated learning addresses this limitation, but existing approaches often rely on lightweight architectures that struggle with complex, heterogeneous data. Recently, the Segment Anything Model (SAM) has shown outstanding segmentation capabilities; however, its massive encoder poses significant challenges in federated settings. In this work, we present the first personalized federated SAM framework tailored for heterogeneous data scenarios in medical image segmentation. Our framework integrates two key innovations: (1) a personalized strategy that aggregates only the global parameters to capture cross-client commonalities while retaining the designed L-MoE (Localized Mixture-of-Experts) component to preserve domain-specific features; and (2) a decoupled global-local fine-tuning mechanism that leverages a teacher-student paradigm via knowledge distillation to bridge the gap between the global shared model and the personalized local models, thereby mitigating overgeneralization. Extensive experiments on two public datasets validate that our approach significantly improves segmentation performance, achieves robust cross-domain adaptation, and reduces communication overhead.
Abstract:Pre-trained code models rely heavily on high-quality pre-training data, particularly human-written reference comments that bridge code and natural language. However, these comments often become outdated as software evolves, degrading model performance. Large language models (LLMs) excel at generating high-quality code comments. We investigate whether replacing human-written comments with LLM-generated ones improves pre-training datasets. Since standard metrics cannot assess reference comment quality, we propose two novel reference-free evaluation tasks: code-comment inconsistency detection and semantic code search. Results show that LLM-generated comments are more semantically consistent with code than human-written ones, as confirmed by manual evaluation. Leveraging this finding, we rebuild the CodeSearchNet dataset with LLM-generated comments and re-pre-train CodeT5. Evaluations demonstrate that models trained on LLM-enhanced data outperform those using original human comments in code summarization, generation, and translation tasks. This work validates rebuilding pre-training datasets with LLMs to advance code intelligence, challenging the traditional reliance on human reference comments.
Abstract:In this paper, we focus on the task of instruction-based image editing. Previous works like InstructPix2Pix, InstructDiffusion, and SmartEdit have explored end-to-end editing. However, two limitations still remain: First, existing datasets suffer from low resolution, poor background consistency, and overly simplistic instructions. Second, current approaches mainly condition on the text while the rich image information is underexplored, therefore inferior in complex instruction following and maintaining background consistency. Targeting these issues, we first curated the AdvancedEdit dataset using a novel data construction pipeline, formulating a large-scale dataset with high visual quality, complex instructions, and good background consistency. Then, to further inject the rich image information, we introduce a two-stream bridging mechanism utilizing both the textual and visual features reasoned by the powerful Multimodal Large Language Models (MLLM) to guide the image editing process more precisely. Extensive results demonstrate that our approach, InsightEdit, achieves state-of-the-art performance, excelling in complex instruction following and maintaining high background consistency with the original image.
Abstract:Hypomimia is a non-motor symptom of Parkinson's disease that manifests as delayed facial movements and expressions, along with challenges in articulation and emotion. Currently, subjective evaluation by neurologists is the primary method for hypomimia detection, and conventional rehabilitation approaches heavily rely on verbal prompts from rehabilitation physicians. There remains a deficiency in accessible, user-friendly and scientifically rigorous assistive tools for hypomimia treatments. To investigate this, we developed HypomimaCoach, an Action Unit (AU)-based digital therapy system for hypomimia detection and rehabilitation in Parkinson's disease. The HypomimaCoach system was designed to facilitate engagement through the incorporation of both relaxed and controlled rehabilitation exercises, while also stimulating initiative through the integration of digital therapies that incorporated traditional face training methods. We extract action unit(AU) features and their relationship for hypomimia detection. In order to facilitate rehabilitation, a series of training programmes have been devised based on the Action Units (AUs) and patients are provided with real-time feedback through an additional AU recognition model, which guides them through their training routines. A pilot study was conducted with seven participants in China, all of whom exhibited symptoms of Parkinson's disease hypomimia. The results of the pilot study demonstrated a positive impact on participants' self-efficacy, with favourable feedback received. Furthermore, physician evaluations validated the system's applicability in a therapeutic setting for patients with Parkinson's disease, as well as its potential value in clinical applications.
Abstract:The quasipotential function allows for comprehension and prediction of the escape mechanisms from metastable states in nonlinear dynamical systems. This function acts as a natural extension of the potential function for non-gradient systems and it unveils important properties such as the maximum likelihood transition paths, transition rates and expected exit times of the system. Here, we leverage on machine learning via the combination of two data-driven techniques, namely a neural network and a sparse regression algorithm, to obtain symbolic expressions of quasipotential functions. The key idea is first to determine an orthogonal decomposition of the vector field that governs the underlying dynamics using neural networks, then to interpret symbolically the downhill and circulatory components of the decomposition. These functions are regressed simultaneously with the addition of mathematical constraints. We show that our approach discovers a parsimonious quasipotential equation for an archetypal model with a known exact quasipotential and for the dynamics of a nanomechanical resonator. The analytical forms deliver direct access to the stability of the metastable states and predict rare events with significant computational advantages. Our data-driven approach is of interest for a wide range of applications in which to assess the fluctuating dynamics.
Abstract:In this paper, we propose a novel integrated sensing and communications (ISAC) framework for the sixth generation (6G) mobile networks, in which we decompose the real physical world into static environment, dynamic targets, and various object materials. The ubiquitous static environment occupies the vast majority of the physical world, for which we design static environment reconstruction (SER) scheme to obtain the layout and point cloud information of static buildings. The dynamic targets floating in static environments create the spatiotemporal transition of the physical world, for which we design comprehensive dynamic target sensing (DTS) scheme to detect, estimate, track, image and recognize the dynamic targets in real-time. The object materials enrich the electromagnetic laws of the physical world, for which we develop object material recognition (OMR) scheme to estimate the electromagnetic coefficient of the objects. Besides, to integrate these sensing functions into existing communications systems, we discuss the interference issues and corresponding solutions for ISAC cellular networks. Furthermore, we develop an ISAC hardware prototype platform that can reconstruct the environmental maps and sense the dynamic targets while maintaining communications services. With all these designs, the proposed ISAC framework can support multifarious emerging applications, such as digital twins, low altitude economy, internet of vehicles, marine management, deformation monitoring, etc.
Abstract:With the continuous advancement of vision language models (VLMs) technology, remarkable research achievements have emerged in the dermatology field, the fourth most prevalent human disease category. However, despite these advancements, VLM still faces "hallucination" in dermatological diagnosis, and due to the inherent complexity of dermatological conditions, existing tools offer relatively limited support for user comprehension. We propose SkinGEN, a diagnosis-to-generation framework that leverages the stable diffusion (SD) method to generate reference demonstrations from diagnosis results provided by VLM, thereby enhancing the visual explainability for users. Through extensive experiments with Low-Rank Adaptation (LoRA), we identify optimal strategies for skin condition image generation. We conduct a user study with 32 participants evaluating both the system performance and explainability. Results demonstrate that SkinGEN significantly improves users' comprehension of VLM predictions and fosters increased trust in the diagnostic process. This work paves the way for more transparent and user-centric VLM applications in dermatology and beyond.
Abstract:The committor function is a central object for quantifying the transitions between metastable states of dynamical systems. Recently, a number of computational methods based on deep neural networks have been developed for computing the high-dimensional committor function. The success of the methods relies on sampling adequate data for the transition, which still is a challenging task for complex systems at low temperatures. In this work, we propose a deep learning method with two novel adaptive sampling schemes (I and II). In the two schemes, the data are generated actively with a modified potential where the bias potential is constructed from the learned committor function. We theoretically demonstrate the advantages of the sampling schemes and show that the data in sampling scheme II are uniformly distributed along the transition tube. This makes a promising method for studying the transition of complex systems. The efficiency of the method is illustrated in high-dimensional systems including the alanine dipeptide and a solvated dimer system.
Abstract:Understanding the transition events between metastable states in complex systems is an important subject in the fields of computational physics, chemistry and biology. The transition pathway plays an important role in characterizing the mechanism underlying the transition, for example, in the study of conformational changes of bio-molecules. In fact, computing the transition pathway is a challenging task for complex and high-dimensional systems. In this work, we formulate the path-finding task as a cost minimization problem over a particular path space. The cost function is adapted from the Freidlin-Wentzell action functional so that it is able to deal with rough potential landscapes. The path-finding problem is then solved using a actor-critic method based on the deep deterministic policy gradient algorithm (DDPG). The method incorporates the potential force of the system in the policy for generating episodes and combines physical properties of the system with the learning process for molecular systems. The exploitation and exploration nature of reinforcement learning enables the method to efficiently sample the transition events and compute the globally optimal transition pathway. We illustrate the effectiveness of the proposed method using three benchmark systems including an extended Mueller system and the Lennard-Jones system of seven particles.
Abstract:Graves' disease is a common condition that is diagnosed clinically by determining the smoothness of the thyroid texture and its morphology in ultrasound images. Currently, the most widely used approach for the automated diagnosis of Graves' disease utilizes Convolutional Neural Networks (CNNs) for both feature extraction and classification. However, these methods demonstrate limited efficacy in capturing texture features. Given the high capacity of wavelets in describing texture features, this research integrates learnable wavelet modules utilizing the Lifting Scheme into CNNs and incorporates a parallel wavelet branch into the ResNet18 model to enhance texture feature extraction. Our model can analyze texture features in spatial and frequency domains simultaneously, leading to optimized classification accuracy. We conducted experiments on collected ultrasound datasets and publicly available natural image texture datasets, our proposed network achieved 97.27% accuracy and 95.60% recall on ultrasound datasets, 60.765% accuracy on natural image texture datasets, surpassing the accuracy of ResNet and conrming the effectiveness of our approach.