Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tao Zhou

A hybrid FEM-PINN method for time-dependent partial differential equations

Sep 04, 2024

Xiaodong Feng, Haojiong Shangguan, Tao Tang, Xiaoliang Wan, Tao Zhou

Figure 1 for A hybrid FEM-PINN method for time-dependent partial differential equations

Figure 2 for A hybrid FEM-PINN method for time-dependent partial differential equations

Figure 3 for A hybrid FEM-PINN method for time-dependent partial differential equations

Figure 4 for A hybrid FEM-PINN method for time-dependent partial differential equations

Abstract:In this work, we present a hybrid numerical method for solving evolution partial differential equations (PDEs) by merging the time finite element method with deep neural networks. In contrast to the conventional deep learning-based formulation where the neural network is defined on a spatiotemporal domain, our methodology utilizes finite element basis functions in the time direction where the space-dependent coefficients are defined as the output of a neural network. We then apply the Galerkin or collocation projection in the time direction to obtain a system of PDEs for the space-dependent coefficients which is approximated in the framework of PINN. The advantages of such a hybrid formulation are twofold: statistical errors are avoided for the integral in the time direction, and the neural network's output can be regarded as a set of reduced spatial basis functions. To further alleviate the difficulties from high dimensionality and low regularity, we have developed an adaptive sampling strategy that refines the training set. More specifically, we use an explicit density model to approximate the distribution induced by the PDE residual and then augment the training set with new time-dependent random samples given by the learned density model. The effectiveness and efficiency of our proposed method have been demonstrated through a series of numerical experiments.

* 25pages

Via

Access Paper or Ask Questions

DeepSPoC: A Deep Learning-Based PDE Solver Governed by Sequential Propagation of Chaos

Aug 29, 2024

Kai Du, Yongle Xie, Tao Zhou, Yuancheng Zhou

Figure 1 for DeepSPoC: A Deep Learning-Based PDE Solver Governed by Sequential Propagation of Chaos

Figure 2 for DeepSPoC: A Deep Learning-Based PDE Solver Governed by Sequential Propagation of Chaos

Figure 3 for DeepSPoC: A Deep Learning-Based PDE Solver Governed by Sequential Propagation of Chaos

Figure 4 for DeepSPoC: A Deep Learning-Based PDE Solver Governed by Sequential Propagation of Chaos

Abstract:Sequential propagation of chaos (SPoC) is a recently developed tool to solve mean-field stochastic differential equations and their related nonlinear Fokker-Planck equations. Based on the theory of SPoC, we present a new method (deepSPoC) that combines the interacting particle system of SPoC and deep learning. Under the framework of deepSPoC, two classes of frequently used deep models include fully connected neural networks and normalizing flows are considered. For high-dimensional problems, spatial adaptive method are designed to further improve the accuracy and efficiency of deepSPoC. We analysis the convergence of the framework of deepSPoC under some simplified conditions and also provide a posterior error estimation for the algorithm. Finally, we test our methods on a wide range of different types of mean-field equations.

Via

Access Paper or Ask Questions

SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues

Jun 27, 2024

Yuxin Xie, Tao Zhou, Yi Zhou, Geng Chen

Figure 1 for SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues

Figure 2 for SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues

Figure 3 for SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues

Figure 4 for SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues

Abstract:Weakly-supervised medical image segmentation is a challenging task that aims to reduce the annotation cost while keep the segmentation performance. In this paper, we present a novel framework, SimTxtSeg, that leverages simple text cues to generate high-quality pseudo-labels and study the cross-modal fusion in training segmentation models, simultaneously. Our contribution consists of two key components: an effective Textual-to-Visual Cue Converter that produces visual prompts from text prompts on medical images, and a text-guided segmentation model with Text-Vision Hybrid Attention that fuses text and image features. We evaluate our framework on two medical image segmentation tasks: colonic polyp segmentation and MRI brain tumor segmentation, and achieve consistent state-of-the-art performance.

Via

Access Paper or Ask Questions

Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation

Jun 03, 2024

Tianyu Huang, Tao Zhou, Weidi Xie, Shuo Wang, Qi Dou, Yizhe Zhang

Figure 1 for Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation

Figure 2 for Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation

Figure 3 for Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation

Figure 4 for Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation

Abstract:The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entail manual or semi-manual corrections employing state-of-the-art annotation tools. Motivated by this process, we introduce a novel approach that leverages the advantages of online machine learning to enhance Segment Anything (SA) during test time. We employ rectified annotations to perform online learning, with the aim of improving the segmentation quality of SA on medical images. To improve the effectiveness and efficiency of online learning when integrated with large-scale vision models like SAM, we propose a new method called Auxiliary Online Learning (AuxOL). AuxOL creates and applies a small auxiliary model (specialist) in conjunction with SAM (generalist), entails adaptive online-batch and adaptive segmentation fusion. Experiments conducted on eight datasets covering four medical imaging modalities validate the effectiveness of the proposed method. Our work proposes and validates a new, practical, and effective approach for enhancing SA on downstream segmentation tasks (e.g., medical image segmentation).

* Project Link: https://sam-auxol.github.io/AuxOL/

Via

Access Paper or Ask Questions

Predicting ptychography probe positions using single-shot phase retrieval neural network

May 31, 2024

Ming Du, Tao Zhou, Junjing Deng, Daniel J. Ching, Steven Henke, Mathew J. Cherukara

Abstract:Ptychography is a powerful imaging technique that is used in a variety of fields, including materials science, biology, and nanotechnology. However, the accuracy of the reconstructed ptychography image is highly dependent on the accuracy of the recorded probe positions which often contain errors. These errors are typically corrected jointly with phase retrieval through numerical optimization approaches. When the error accumulates along the scan path or when the error magnitude is large, these approaches may not converge with satisfactory result. We propose a fundamentally new approach for ptychography probe position prediction for data with large position errors, where a neural network is used to make single-shot phase retrieval on individual diffraction patterns, yielding the object image at each scan point. The pairwise offsets among these images are then found using a robust image registration method, and the results are combined to yield the complete scan path by constructing and solving a linear equation. We show that our method can achieve good position prediction accuracy for data with large and accumulating errors on the order of $10^2$ pixels, a magnitude that often makes optimization-based algorithms fail to converge. For ptychography instruments without sophisticated position control equipment such as interferometers, our method is of significant practical potential.

Via

Access Paper or Ask Questions

MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text Expertise

May 20, 2024

Ruiqi Wu, Chenran Zhang, Jianle Zhang, Yi Zhou, Tao Zhou, Huazhu Fu

Figure 1 for MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text Expertise

Figure 2 for MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text Expertise

Figure 3 for MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text Expertise

Figure 4 for MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text Expertise

Abstract:Current fundus image analysis models are predominantly built for specific tasks relying on individual datasets. The learning process is usually based on data-driven paradigm without prior knowledge, resulting in poor transferability and generalizability. To address this issue, we propose MM-Retinal, a multi-modal dataset that encompasses high-quality image-text pairs collected from professional fundus diagram books. Moreover, enabled by MM-Retinal, we present a novel Knowledge-enhanced foundational pretraining model which incorporates Fundus Image-Text expertise, called KeepFIT. It is designed with image similarity-guided text revision and mixed training strategy to infuse expert knowledge. Our proposed fundus foundation model achieves state-of-the-art performance across six unseen downstream tasks and holds excellent generalization ability in zero-shot and few-shot scenarios. MM-Retinal and KeepFIT are available at https://github.com/lxirich/MM-Retinal.

* Early Accepted by The International Conference on Medical Image Computing and Computer Assisted Intervention(MICCAI)2024

Via

Access Paper or Ask Questions

Community Detection for Heterogeneous Multiple Social Networks

May 07, 2024

Ziqing Zhu, Guan Yuan, Tao Zhou, Jiuxin Cao

Figure 1 for Community Detection for Heterogeneous Multiple Social Networks

Figure 2 for Community Detection for Heterogeneous Multiple Social Networks

Figure 3 for Community Detection for Heterogeneous Multiple Social Networks

Figure 4 for Community Detection for Heterogeneous Multiple Social Networks

Abstract:The community plays a crucial role in understanding user behavior and network characteristics in social networks. Some users can use multiple social networks at once for a variety of objectives. These users are called overlapping users who bridge different social networks. Detecting communities across multiple social networks is vital for interaction mining, information diffusion, and behavior migration analysis among networks. This paper presents a community detection method based on nonnegative matrix tri-factorization for multiple heterogeneous social networks, which formulates a common consensus matrix to represent the global fused community. Specifically, the proposed method involves creating adjacency matrices based on network structure and content similarity, followed by alignment matrices which distinguish overlapping users in different social networks. With the generated alignment matrices, the method could enhance the fusion degree of the global community by detecting overlapping user communities across networks. The effectiveness of the proposed method is evaluated with new metrics on Twitter, Instagram, and Tumblr datasets. The results of the experiments demonstrate its superior performance in terms of community quality and community fusion.

* This paper was accepted by IEEE Transactions on Computational Social Systems(TCSS)

Via

Access Paper or Ask Questions

SAM-PD: How Far Can SAM Take Us in Tracking and Segmenting Anything in Videos by Prompt Denoising

Mar 07, 2024

Tao Zhou, Wenhan Luo, Qi Ye, Zhiguo Shi, Jiming Chen

Abstract:Recently, promptable segmentation models, such as the Segment Anything Model (SAM), have demonstrated robust zero-shot generalization capabilities on static images. These promptable models exhibit denoising abilities for imprecise prompt inputs, such as imprecise bounding boxes. In this paper, we explore the potential of applying SAM to track and segment objects in videos where we recognize the tracking task as a prompt denoising task. Specifically, we iteratively propagate the bounding box of each object's mask in the preceding frame as the prompt for the next frame. Furthermore, to enhance SAM's denoising capability against position and size variations, we propose a multi-prompt strategy where we provide multiple jittered and scaled box prompts for each object and preserve the mask prediction with the highest semantic similarity to the template mask. We also introduce a point-based refinement stage to handle occlusions and reduce cumulative errors. Without involving tracking modules, our approach demonstrates comparable performance in video object/instance segmentation tasks on three datasets: DAVIS2017, YouTubeVOS2018, and UVO, serving as a concise baseline and endowing SAM-based downstream applications with tracking capabilities.

Via

Access Paper or Ask Questions

Energy based diffusion generator for efficient sampling of Boltzmann distributions

Jan 04, 2024

Yan Wang, Ling Guo, Hao Wu, Tao Zhou

Abstract:We introduce a novel sampler called the energy based diffusion generator for generating samples from arbitrary target distributions. The sampling model employs a structure similar to a variational autoencoder, utilizing a decoder to transform latent variables from a simple distribution into random variables approximating the target distribution, and we design an encoder based on the diffusion model. Leveraging the powerful modeling capacity of the diffusion model for complex distributions, we can obtain an accurate variational estimate of the Kullback-Leibler divergence between the distributions of the generated samples and the target. Moreover, we propose a decoder based on generalized Hamiltonian dynamics to further enhance sampling performance. Through empirical evaluation, we demonstrate the effectiveness of our method across various complex distribution functions, showcasing its superiority compared to existing methods.

Via

Access Paper or Ask Questions

PokerGPT: An End-to-End Lightweight Solver for Multi-Player Texas Hold'em via Large Language Model

Jan 04, 2024

Chenghao Huang, Yanbo Cao, Yinlong Wen, Tao Zhou, Yanru Zhang

Abstract:Poker, also known as Texas Hold'em, has always been a typical research target within imperfect information games (IIGs). IIGs have long served as a measure of artificial intelligence (AI) development. Representative prior works, such as DeepStack and Libratus heavily rely on counterfactual regret minimization (CFR) to tackle heads-up no-limit Poker. However, it is challenging for subsequent researchers to learn CFR from previous models and apply it to other real-world applications due to the expensive computational cost of CFR iterations. Additionally, CFR is difficult to apply to multi-player games due to the exponential growth of the game tree size. In this work, we introduce PokerGPT, an end-to-end solver for playing Texas Hold'em with arbitrary number of players and gaining high win rates, established on a lightweight large language model (LLM). PokerGPT only requires simple textual information of Poker games for generating decision-making advice, thus guaranteeing the convenient interaction between AI and humans. We mainly transform a set of textual records acquired from real games into prompts, and use them to fine-tune a lightweight pre-trained LLM using reinforcement learning human feedback technique. To improve fine-tuning performance, we conduct prompt engineering on raw data, including filtering useful information, selecting behaviors of players with high win rates, and further processing them into textual instruction using multiple prompt engineering techniques. Through the experiments, we demonstrate that PokerGPT outperforms previous approaches in terms of win rate, model size, training time, and response speed, indicating the great potential of LLMs in solving IIGs.

Via

Access Paper or Ask Questions