Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tao Song

Shanghai Jiao Tong University

Cross Group Attention and Group-wise Rolling for Multimodal Medical Image Synthesis

Nov 22, 2024

Tao Song, Yicheng Wu, Minhao Hu, Xiangde Luo, Linda Wei, Guotai Wang, Yi Guo, Feng Xu, Shaoting Zhang

Abstract:Multimodal MR image synthesis aims to generate missing modality image by fusing and mapping a few available MRI data. Most existing approaches typically adopt an image-to-image translation scheme. However, these methods often suffer from sub-optimal performance due to the spatial misalignment between different modalities while they are typically treated as input channels. Therefore, in this paper, we propose an Adaptive Group-wise Interaction Network (AGI-Net) that explores both inter-modality and intra-modality relationships for multimodal MR image synthesis. Specifically, groups are first pre-defined along the channel dimension and then we perform an adaptive rolling for the standard convolutional kernel to capture inter-modality spatial correspondences. At the same time, a cross-group attention module is introduced to fuse information across different channel groups, leading to better feature representation. We evaluated the effectiveness of our model on the publicly available IXI and BraTS2023 datasets, where the AGI-Net achieved state-of-the-art performance for multimodal MR image synthesis. Code will be released.

Via

Access Paper or Ask Questions

Phantom: Constraining Generative Artificial Intelligence Models for Practical Domain Specific Peripherals Trace Synthesizing

Nov 10, 2024

Zhibai Huang, Yihan Shen, Yongchen Xie, Zhixiang Wei, Yun wang, Fangxin Liu, Tao Song, Zhengwei Qi

Abstract:Peripheral Component Interconnect Express (PCIe) is the de facto interconnect standard for high-speed peripherals and CPUs. Prototyping and optimizing PCIe devices for emerging scenarios is an ongoing challenge. Since Transaction Layer Packets (TLPs) capture device-CPU interactions, it is crucial to analyze and generate realistic TLP traces for effective device design and optimization. Generative AI offers a promising approach for creating intricate, custom TLP traces necessary for PCIe hardware and software development. However, existing models often generate impractical traces due to the absence of PCIe-specific constraints, such as TLP ordering and causality. This paper presents Phantom, the first framework that treats TLP trace generation as a generative AI problem while incorporating PCIe-specific constraints. We validate Phantom's effectiveness by generating TLP traces for an actual PCIe network interface card. Experimental results show that Phantom produces practical, large-scale TLP traces, significantly outperforming existing models, with improvements of up to 1000$\times$ in task-specific metrics and up to 2.19$\times$ in Frechet Inception Distance (FID) compared to backbone-only methods.

Via

Access Paper or Ask Questions

Learning Partial Differential Equations with Deep Parallel Neural Operators

Sep 30, 2024

Qinglong Ma, Peizhi Zhao, Sen Wang, Tao Song

Figure 1 for Learning Partial Differential Equations with Deep Parallel Neural Operators

Figure 2 for Learning Partial Differential Equations with Deep Parallel Neural Operators

Figure 3 for Learning Partial Differential Equations with Deep Parallel Neural Operators

Figure 4 for Learning Partial Differential Equations with Deep Parallel Neural Operators

Abstract:In recent years, Solving partial differential equations has shifted the focus of traditional neural network studies from finite-dimensional Euclidean spaces to generalized functional spaces in research. A novel methodology is to learn an operator as a means of approximating the mapping between outputs. Currently, researchers have proposed a variety of operator architectures. Nevertheless, the majority of these architectures adopt an iterative update architecture, whereby a single operator is learned from the same function space. In practical physical science problems, the numerical solutions of partial differential equations are complex, and a serial single operator is unable to accurately approximate the intricate mapping between input and output. So, We propose a deep parallel operator model (DPNO) for efficiently and accurately solving partial differential equations. DPNO employs convolutional neural networks to extract local features and map data into distinct latent spaces. Designing a parallel block of double Fourier neural operators to solve the iterative error problem. DPNO approximates complex mappings between inputs and outputs by learning multiple operators in different potential spaces in parallel blocks. DPNO achieved the best performance on five of them, with an average improvement of 10.5\%, and ranked second on one dataset.

Via

Access Paper or Ask Questions

General-Kindred Physics-Informed Neural Network to the Solutions of Singularly Perturbed Differential Equations

Aug 27, 2024

Sen Wang, Peizhi Zhao, Qinglong Ma, Tao Song

Figure 1 for General-Kindred Physics-Informed Neural Network to the Solutions of Singularly Perturbed Differential Equations

Figure 2 for General-Kindred Physics-Informed Neural Network to the Solutions of Singularly Perturbed Differential Equations

Figure 3 for General-Kindred Physics-Informed Neural Network to the Solutions of Singularly Perturbed Differential Equations

Figure 4 for General-Kindred Physics-Informed Neural Network to the Solutions of Singularly Perturbed Differential Equations

Abstract:Physics-Informed Neural Networks (PINNs) have become a promising research direction in the field of solving Partial Differential Equations (PDEs). Dealing with singular perturbation problems continues to be a difficult challenge in the field of PINN. The solution of singular perturbation problems often exhibits sharp boundary layers and steep gradients, and traditional PINN cannot achieve approximation of boundary layers. In this manuscript, we propose the General-Kindred Physics-Informed Neural Network (GKPINN) for solving Singular Perturbation Differential Equations (SPDEs). This approach utilizes asymptotic analysis to acquire prior knowledge of the boundary layer from the equation and establishes a novel network to assist PINN in approximating the boundary layer. It is compared with traditional PINN by solving examples of one-dimensional, two-dimensional, and time-varying SPDE equations. The research findings underscore the exceptional performance of our novel approach, GKPINN, which delivers a remarkable enhancement in reducing the $L_2$ error by two to four orders of magnitude compared to the established PINN methodology. This significant improvement is accompanied by a substantial acceleration in convergence rates, without compromising the high precision that is critical for our applications. Furthermore, GKPINN still performs well in extreme cases with perturbation parameters of ${1\times10}^{-38}$, demonstrating its excellent generalization ability.

Via

Access Paper or Ask Questions

Poisoning with A Pill: Circumventing Detection in Federated Learning

Jul 22, 2024

Hanxi Guo, Hao Wang, Tao Song, Tianhang Zheng, Yang Hua, Haibing Guan, Xiangyu Zhang

Figure 1 for Poisoning with A Pill: Circumventing Detection in Federated Learning

Figure 2 for Poisoning with A Pill: Circumventing Detection in Federated Learning

Figure 3 for Poisoning with A Pill: Circumventing Detection in Federated Learning

Figure 4 for Poisoning with A Pill: Circumventing Detection in Federated Learning

Abstract:Without direct access to the client's data, federated learning (FL) is well-known for its unique strength in data privacy protection among existing distributed machine learning techniques. However, its distributive and iterative nature makes FL inherently vulnerable to various poisoning attacks. To counteract these threats, extensive defenses have been proposed to filter out malicious clients, using various detection metrics. Based on our analysis of existing attacks and defenses, we find that there is a lack of attention to model redundancy. In neural networks, various model parameters contribute differently to the model's performance. However, existing attacks in FL manipulate all the model update parameters with the same strategy, making them easily detectable by common defenses. Meanwhile, the defenses also tend to analyze the overall statistical features of the entire model updates, leaving room for sophisticated attacks. Based on these observations, this paper proposes a generic and attack-agnostic augmentation approach designed to enhance the effectiveness and stealthiness of existing FL poisoning attacks against detection in FL, pointing out the inherent flaws of existing defenses and exposing the necessity of fine-grained FL security. Specifically, we employ a three-stage methodology that strategically constructs, generates, and injects poison (generated by existing attacks) into a pill (a tiny subnet with a novel structure) during the FL training, named as pill construction, pill poisoning, and pill injection accordingly. Extensive experimental results show that FL poisoning attacks enhanced by our method can bypass all the popular defenses, and can gain an up to 7x error rate increase, as well as on average a more than 2x error rate increase on both IID and non-IID data, in both cross-silo and cross-device FL systems.

Via

Access Paper or Ask Questions

Enhancing Stability for Large Models Training in Constrained Bandwidth Networks

Jun 28, 2024

Yun Dai, Tejas Dharamsi, Byron Hsu, Tao Song, Hamed Firooz

Abstract:Training extremely large language models with billions of parameters is a computationally intensive task that pushes the limits of current data parallel training systems. While techniques like ZeRO++ have enabled efficient distributed training of such giant models on inexpensive low-bandwidth clusters, they can suffer from convergence issues due to potential race conditions in the hierarchical partitioning (hpZ) scheme employed to reduce cross-machine communication. In this work, we first show how these race conditions cause instability when training models with billions of parameters. We then propose a modification to the partitioning algorithm that addresses these convergence challenges while maintaining competitive training efficiency. Empirical evaluation on training the multi-billion parameters Falcon Models and Llama-2 models demonstrates the updated algorithm's ability to achieve reliable convergence on these massive models, where stock ZeRO++ hpZ fails to converge. The updated algorithm enables robust training of larger models with 98\% throughput and model training speed improvement without sacrificing the quality of convergence.

Via

Access Paper or Ask Questions

Domain-specific ReAct for physics-integrated iterative modeling: A case study of LLM agents for gas path analysis of gas turbines

Jun 01, 2024

Tao Song, Yuwei Fan, Chenlong Feng, Keyu Song, Chao Liu, Dongxiang Jiang

Figure 1 for Domain-specific ReAct for physics-integrated iterative modeling: A case study of LLM agents for gas path analysis of gas turbines

Figure 2 for Domain-specific ReAct for physics-integrated iterative modeling: A case study of LLM agents for gas path analysis of gas turbines

Figure 3 for Domain-specific ReAct for physics-integrated iterative modeling: A case study of LLM agents for gas path analysis of gas turbines

Figure 4 for Domain-specific ReAct for physics-integrated iterative modeling: A case study of LLM agents for gas path analysis of gas turbines

Abstract:This study explores the application of large language models (LLMs) with callable tools in energy and power engineering domain, focusing on gas path analysis of gas turbines. We developed a dual-agent tool-calling process to integrate expert knowledge, predefined tools, and LLM reasoning. We evaluated various LLMs, including LLama3, Qwen1.5 and GPT. Smaller models struggled with tool usage and parameter extraction, while larger models demonstrated favorable capabilities. All models faced challenges with complex, multi-component problems. Based on the test results, we infer that LLMs with nearly 100 billion parameters could meet professional scenario requirements with fine-tuning and advanced prompt design. Continued development are likely to enhance their accuracy and effectiveness, paving the way for more robust AI-driven solutions.

Via

Access Paper or Ask Questions

Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks

May 30, 2024

Xiaoyu Wu, Jiaru Zhang, Yang Hua, Bohan Lyu, Hao Wang, Tao Song, Haibing Guan

Figure 1 for Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks

Figure 2 for Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks

Figure 3 for Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks

Figure 4 for Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks

Abstract:Few-shot fine-tuning of Diffusion Models (DMs) is a key advancement, significantly reducing training costs and enabling personalized AI applications. However, we explore the training dynamics of DMs and observe an unanticipated phenomenon: during the training process, image fidelity initially improves, then unexpectedly deteriorates with the emergence of noisy patterns, only to recover later with severe overfitting. We term the stage with generated noisy patterns as corruption stage. To understand this corruption stage, we begin by theoretically modeling the one-shot fine-tuning scenario, and then extend this modeling to more general cases. Through this modeling, we identify the primary cause of this corruption stage: a narrowed learning distribution inherent in the nature of few-shot fine-tuning. To tackle this, we apply Bayesian Neural Networks (BNNs) on DMs with variational inference to implicitly broaden the learned distribution, and present that the learning target of the BNNs can be naturally regarded as an expectation of the diffusion loss and a further regularization with the pretrained DMs. This approach is highly compatible with current few-shot fine-tuning methods in DMs and does not introduce any extra inference costs. Experimental results demonstrate that our method significantly mitigates corruption, and improves the fidelity, quality and diversity of the generated images in both object-driven and subject-driven generation tasks.

* Preprint. Under review

Via

Access Paper or Ask Questions

Onset and offset weighted loss function for sound event detection

Mar 20, 2024

Tao Song

Abstract:In a typical sound event detection (SED) system, the existence of a sound event is detected at a frame level, and consecutive frames with the same event detected are combined as one sound event. The median filter is applied as a post-processing step to remove detection errors as much as possible. However, detection errors occurring around the onset and offset of a sound event are beyond the capacity of the median filter. To address this issue, an onset and offset weighted binary cross-entropy (OWBCE) loss function is proposed in this paper, which trains the DNN model to be more robust on frames around (a) onsets and offsets. Experiments are carried out in the context of DCASE 2022 task 4. Results show that OWBCE outperforms BCE when different models are considered. For a basic CRNN, relative improvements of 6.43% in event-F1, 1.96% in PSDS1, and 2.43% in PSDS2 can be achieved by OWBCE.

Via

Access Paper or Ask Questions

Frequency-aware convolution for sound event detection

Mar 20, 2024

Tao Song

Abstract:In sound event detection (SED), convolution neural networks (CNNs) are widely used to extract time-frequency patterns from the input spectrogram. However, features extracted by CNN can be insensitive to the shift of time-frequency patterns along the frequency axis. To address this issue, frequency dynamic convolution (FDY) has been proposed, which applies different kernels to different frequency components. Compared to the vannila CNN, FDY requires several times more parameters. In this paper, a more efficient solution named frequency-aware convolution (FAC) is proposed. In FAC, frequency-positional information is encoded in a vector and added to the input spectrogram. To match the amplitude of input, the encoding vector is scaled adaptively and channel-independently. Experiments are carried out in the context of DCASE 2022 task 4, and the results demonstrate that FAC can achieve comparable performance to that of FDY with only 515 additional parameters, while FDY requires 8.02 million additional parameters. The ablation study shows that scaling the encoding vector adaptively and channel-independently is critical to the performance of FAC.

Via

Access Paper or Ask Questions