Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hu Wang

In-Model Merging for Enhancing the Robustness of Medical Imaging Classification Models

Feb 27, 2025

Hu Wang, Ibrahim Almakky, Congbo Ma, Numan Saeed, Mohammad Yaqub

Figure 1 for In-Model Merging for Enhancing the Robustness of Medical Imaging Classification Models

Figure 2 for In-Model Merging for Enhancing the Robustness of Medical Imaging Classification Models

Figure 3 for In-Model Merging for Enhancing the Robustness of Medical Imaging Classification Models

Figure 4 for In-Model Merging for Enhancing the Robustness of Medical Imaging Classification Models

Abstract:Model merging is an effective strategy to merge multiple models for enhancing model performances, and more efficient than ensemble learning as it will not introduce extra computation into inference. However, limited research explores if the merging process can occur within one model and enhance the model's robustness, which is particularly critical in the medical image domain. In the paper, we are the first to propose in-model merging (InMerge), a novel approach that enhances the model's robustness by selectively merging similar convolutional kernels in the deep layers of a single convolutional neural network (CNN) during the training process for classification. We also analytically reveal important characteristics that affect how in-model merging should be performed, serving as an insightful reference for the community. We demonstrate the feasibility and effectiveness of this technique for different CNN architectures on 4 prevalent datasets. The proposed InMerge-trained model surpasses the typically-trained model by a substantial margin. The code will be made public.

Via

Access Paper or Ask Questions

Rethinking Weight-Averaged Model-merging

Nov 14, 2024

Hu Wang, Congbo Ma, Ibrahim Almakky, Ian Reid, Gustavo Carneiro, Mohammad Yaqub

Figure 1 for Rethinking Weight-Averaged Model-merging

Figure 2 for Rethinking Weight-Averaged Model-merging

Figure 3 for Rethinking Weight-Averaged Model-merging

Figure 4 for Rethinking Weight-Averaged Model-merging

Abstract:Weight-averaged model-merging has emerged as a powerful approach in deep learning, capable of enhancing model performance without fine-tuning or retraining. However, the underlying mechanisms that explain its effectiveness remain largely unexplored. In this paper, we investigate this technique from three novel perspectives to provide deeper insights into how and why weight-averaged model-merging works: (1) we examine the intrinsic patterns captured by the learning of the model weights, through the visualizations of their patterns on several datasets, showing that these weights often encode structured and interpretable patterns; (2) we investigate model ensemble merging strategies based on averaging on weights versus averaging on features, providing detailed analyses across diverse architectures and datasets; and (3) we explore the impact on model-merging prediction stability in terms of changing the parameter magnitude, revealing insights into the way of weight averaging works as regularization by showing the robustness across different parameter scales. Our findings shed light on the "black box" of weight-averaged model-merging, offering valuable insights and practical recommendations that advance the model-merging process.

Via

Access Paper or Ask Questions

A Comprehensive Survey on Deep Multimodal Learning with Missing Modality

Sep 12, 2024

Renjie Wu, Hu Wang, Hsiang-Ting Chen

Figure 1 for A Comprehensive Survey on Deep Multimodal Learning with Missing Modality

Figure 2 for A Comprehensive Survey on Deep Multimodal Learning with Missing Modality

Figure 3 for A Comprehensive Survey on Deep Multimodal Learning with Missing Modality

Figure 4 for A Comprehensive Survey on Deep Multimodal Learning with Missing Modality

Abstract:During multimodal model training and reasoning, data samples may miss certain modalities and lead to compromised model performance due to sensor limitations, cost constraints, privacy concerns, data loss, and temporal and spatial factors. This survey provides an overview of recent progress in Multimodal Learning with Missing Modality (MLMM), focusing on deep learning techniques. It is the first comprehensive survey that covers the historical background and the distinction between MLMM and standard multimodal learning setups, followed by a detailed analysis of current MLMM methods, applications, and datasets, concluding with a discussion about challenges and potential future directions in the field.

* Work in progress and welcome to discussion

Via

Access Paper or Ask Questions

Human-AI Collaborative Multi-modal Multi-rater Learning for Endometriosis Diagnosis

Sep 03, 2024

Hu Wang, David Butler, Yuan Zhang, Jodie Avery, Steven Knox, Congbo Ma, Louise Hull, Gustavo Carneiro

Abstract:Endometriosis, affecting about 10\% of individuals assigned female at birth, is challenging to diagnose and manage. Diagnosis typically involves the identification of various signs of the disease using either laparoscopic surgery or the analysis of T1/T2 MRI images, with the latter being quicker and cheaper but less accurate. A key diagnostic sign of endometriosis is the obliteration of the Pouch of Douglas (POD). However, even experienced clinicians struggle with accurately classifying POD obliteration from MRI images, which complicates the training of reliable AI models. In this paper, we introduce the \underline{H}uman-\underline{AI} \underline{Co}llaborative \underline{M}ulti-modal \underline{M}ulti-rater Learning (HAICOMM) methodology to address the challenge above. HAICOMM is the first method that explores three important aspects of this problem: 1) multi-rater learning to extract a cleaner label from the multiple ``noisy'' labels available per training sample; 2) multi-modal learning to leverage the presence of T1/T2 MRI images for training and testing; and 3) human-AI collaboration to build a system that leverages the predictions from clinicians and the AI model to provide more accurate classification than standalone clinicians and AI models. Presenting results on the multi-rater T1/T2 MRI endometriosis dataset that we collected to validate our methodology, the proposed HAICOMM model outperforms an ensemble of clinicians, noisy-label learning models, and multi-rater learning methods.

Via

Access Paper or Ask Questions

Deformable Feature Alignment and Refinement for Moving Infrared Dim-small Target Detection

Jul 10, 2024

Dengyan Luo, Yanping Xiang, Hu Wang, Luping Ji, Shuai Li, Mao Ye

Figure 1 for Deformable Feature Alignment and Refinement for Moving Infrared Dim-small Target Detection

Figure 2 for Deformable Feature Alignment and Refinement for Moving Infrared Dim-small Target Detection

Figure 3 for Deformable Feature Alignment and Refinement for Moving Infrared Dim-small Target Detection

Figure 4 for Deformable Feature Alignment and Refinement for Moving Infrared Dim-small Target Detection

Abstract:The detection of moving infrared dim-small targets has been a challenging and prevalent research topic. The current state-of-the-art methods are mainly based on ConvLSTM to aggregate information from adjacent frames to facilitate the detection of the current frame. However, these methods implicitly utilize motion information only in the training stage and fail to explicitly explore motion compensation, resulting in poor performance in the case of a video sequence including large motion. In this paper, we propose a Deformable Feature Alignment and Refinement (DFAR) method based on deformable convolution to explicitly use motion context in both the training and inference stages. Specifically, a Temporal Deformable Alignment (TDA) module based on the designed Dilated Convolution Attention Fusion (DCAF) block is developed to explicitly align the adjacent frames with the current frame at the feature level. Then, the feature refinement module adaptively fuses the aligned features and further aggregates useful spatio-temporal information by means of the proposed Attention-guided Deformable Fusion (AGDF) block. In addition, to improve the alignment of adjacent frames with the current frame, we extend the traditional loss function by introducing a new motion compensation loss. Extensive experimental results demonstrate that the proposed DFAR method achieves the state-of-the-art performance on two benchmark datasets including DAUB and IRDST.

Via

Access Paper or Ask Questions

ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation

Jul 09, 2024

Yuyuan Liu, Yuanhong Chen, Hu Wang, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro

Figure 1 for ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation

Figure 2 for ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation

Figure 3 for ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation

Figure 4 for ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation

Abstract:The costly and time-consuming annotation process to produce large training sets for modelling semantic LiDAR segmentation methods has motivated the development of semi-supervised learning (SSL) methods. However, such SSL approaches often concentrate on employing consistency learning only for individual LiDAR representations. This narrow focus results in limited perturbations that generally fail to enable effective consistency learning. Additionally, these SSL approaches employ contrastive learning based on the sampling from a limited set of positive and negative embedding samples. This paper introduces a novel semi-supervised LiDAR semantic segmentation framework called ItTakesTwo (IT2). IT2 is designed to ensure consistent predictions from peer LiDAR representations, thereby improving the perturbation effectiveness in consistency learning. Furthermore, our contrastive learning employs informative samples drawn from a distribution of positive and negative embeddings learned from the entire training set. Results on public benchmarks show that our approach achieves remarkable improvements over the previous state-of-the-art (SOTA) methods in the field. The code is available at: https://github.com/yyliu01/IT2.

* 27 pages (15 pages main paper and 12 pages supplementary with references), ECCV 2024 accepted

Via

Access Paper or Ask Questions

CPM: Class-conditional Prompting Machine for Audio-visual Segmentation

Jul 07, 2024

Yuanhong Chen, Chong Wang, Yuyuan Liu, Hu Wang, Gustavo Carneiro

Figure 1 for CPM: Class-conditional Prompting Machine for Audio-visual Segmentation

Figure 2 for CPM: Class-conditional Prompting Machine for Audio-visual Segmentation

Figure 3 for CPM: Class-conditional Prompting Machine for Audio-visual Segmentation

Figure 4 for CPM: Class-conditional Prompting Machine for Audio-visual Segmentation

Abstract:Audio-visual segmentation (AVS) is an emerging task that aims to accurately segment sounding objects based on audio-visual cues. The success of AVS learning systems depends on the effectiveness of cross-modal interaction. Such a requirement can be naturally fulfilled by leveraging transformer-based segmentation architecture due to its inherent ability to capture long-range dependencies and flexibility in handling different modalities. However, the inherent training issues of transformer-based methods, such as the low efficacy of cross-attention and unstable bipartite matching, can be amplified in AVS, particularly when the learned audio query does not provide a clear semantic clue. In this paper, we address these two issues with the new Class-conditional Prompting Machine (CPM). CPM improves the bipartite matching with a learning strategy combining class-agnostic queries with class-conditional queries. The efficacy of cross-modal attention is upgraded with new learning objectives for the audio, visual and joint modalities. We conduct experiments on AVS benchmarks, demonstrating that our method achieves state-of-the-art (SOTA) segmentation accuracy.

Via

Access Paper or Ask Questions

Enhancing Multi-modal Learning: Meta-learned Cross-modal Knowledge Distillation for Handling Missing Modalities

May 12, 2024

Hu Wang, Congbo Ma, Yuyuan Liu, Yuanhong Chen, Yu Tian, Jodie Avery, Louise Hull, Gustavo Carneiro

Figure 1 for Enhancing Multi-modal Learning: Meta-learned Cross-modal Knowledge Distillation for Handling Missing Modalities

Figure 2 for Enhancing Multi-modal Learning: Meta-learned Cross-modal Knowledge Distillation for Handling Missing Modalities

Figure 3 for Enhancing Multi-modal Learning: Meta-learned Cross-modal Knowledge Distillation for Handling Missing Modalities

Figure 4 for Enhancing Multi-modal Learning: Meta-learned Cross-modal Knowledge Distillation for Handling Missing Modalities

Abstract:In multi-modal learning, some modalities are more influential than others, and their absence can have a significant impact on classification/segmentation accuracy. Hence, an important research question is if it is possible for trained multi-modal models to have high accuracy even when influential modalities are absent from the input data. In this paper, we propose a novel approach called Meta-learned Cross-modal Knowledge Distillation (MCKD) to address this research question. MCKD adaptively estimates the importance weight of each modality through a meta-learning process. These dynamically learned modality importance weights are used in a pairwise cross-modal knowledge distillation process to transfer the knowledge from the modalities with higher importance weight to the modalities with lower importance weight. This cross-modal knowledge distillation produces a highly accurate model even with the absence of influential modalities. Differently from previous methods in the field, our approach is designed to work in multiple tasks (e.g., segmentation and classification) with minimal adaptation. Experimental results on the Brain tumor Segmentation Dataset 2018 (BraTS2018) and the Audiovision-MNIST classification dataset demonstrate the superiority of MCKD over current state-of-the-art models. Particularly in BraTS2018, we achieve substantial improvements of 3.51\% for enhancing tumor, 2.19\% for tumor core, and 1.14\% for the whole tumor in terms of average segmentation Dice score.

Via

Access Paper or Ask Questions

Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation

Dec 14, 2023

Renjie Wu, Hu Wang, Feras Dayoub, Hsiang-Ting Chen

Figure 1 for Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation

Figure 2 for Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation

Figure 3 for Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation

Figure 4 for Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation

Abstract:Augmented Reality (AR) devices, emerging as prominent mobile interaction platforms, face challenges in user safety, particularly concerning oncoming vehicles. While some solutions leverage onboard camera arrays, these cameras often have limited field-of-view (FoV) with front or downward perspectives. Addressing this, we propose a new out-of-view semantic segmentation task and Segment Beyond View (SBV), a novel audio-visual semantic segmentation method. SBV supplements the visual modality, which miss the information beyond FoV, with the auditory information using a teacher-student distillation model (Omni2Ego). The model consists of a vision teacher utilising panoramic information, an auditory teacher with 8-channel audio, and an audio-visual student that takes views with limited FoV and binaural audio as input and produce semantic segmentation for objects outside FoV. SBV outperforms existing models in comparative evaluations and shows a consistent performance across varying FoV ranges and in monaural audio settings.

* Accepted by AAAI-24

Via

Access Paper or Ask Questions

Dynamically configured physics-informed neural network in topology optimization applications

Dec 12, 2023

Jichao Yin, Ziming Wen, Shuhao Li, Yaya Zhanga, Hu Wang

Figure 1 for Dynamically configured physics-informed neural network in topology optimization applications

Figure 2 for Dynamically configured physics-informed neural network in topology optimization applications

Figure 3 for Dynamically configured physics-informed neural network in topology optimization applications

Figure 4 for Dynamically configured physics-informed neural network in topology optimization applications

Abstract:Integration of machine learning (ML) into the topology optimization (TO) framework is attracting increasing attention, but data acquisition in data-driven models is prohibitive. Compared with popular ML methods, the physics-informed neural network (PINN) can avoid generating enormous amounts of data when solving forward problems and additionally provide better inference. To this end, a dynamically configured PINN-based topology optimization (DCPINN-TO) method is proposed. The DCPINN is composed of two subnetworks, namely the backbone neural network (NN) and the coefficient NN, where the coefficient NN has fewer trainable parameters. The designed architecture aims to dynamically configure trainable parameters; that is, an inexpensive NN is used to replace an expensive one at certain optimization cycles. Furthermore, an active sampling strategy is proposed to selectively sample collocations depending on the pseudo-densities at each optimization cycle. In this manner, the number of collocations will decrease with the optimization process but will hardly affect it. The Gaussian integral is used to calculate the strain energy of elements, which yields a byproduct of decoupling the mapping of the material at the collocations. Several examples with different resolutions validate the feasibility of the DCPINN-TO method, and multiload and multiconstraint problems are employed to illustrate its generalization. In addition, compared to finite element analysis-based TO (FEA-TO), the accuracy of the displacement prediction and optimization results indicate that the DCPINN-TO method is effective and efficient.

* 31 pages, 22 figures

Via

Access Paper or Ask Questions