Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David A. Clifton

Department of Engineering Science, University of Oxford, Oxford, UK, Oxford Suzhou Centre for Advanced Research, University of Oxford, Suzhou, Jiangsu, China

Enhancing Generalization via Sharpness-Aware Trajectory Matching for Dataset Condensation

Feb 03, 2025

Boyan Gao, Bo Zhao, Shreyank N Gowda, Xingrun Xing, Yibo Yang, Timothy Hospedales, David A. Clifton

Figure 1 for Enhancing Generalization via Sharpness-Aware Trajectory Matching for Dataset Condensation

Figure 2 for Enhancing Generalization via Sharpness-Aware Trajectory Matching for Dataset Condensation

Figure 3 for Enhancing Generalization via Sharpness-Aware Trajectory Matching for Dataset Condensation

Figure 4 for Enhancing Generalization via Sharpness-Aware Trajectory Matching for Dataset Condensation

Abstract:Dataset condensation aims to synthesize datasets with a few representative samples that can effectively represent the original datasets. This enables efficient training and produces models with performance close to those trained on the original sets. Most existing dataset condensation methods conduct dataset learning under the bilevel (inner- and outer-loop) based optimization. However, the preceding methods perform with limited dataset generalization due to the notoriously complicated loss landscape and expensive time-space complexity of the inner-loop unrolling of bilevel optimization. These issues deteriorate when the datasets are learned via matching the trajectories of networks trained on the real and synthetic datasets with a long horizon inner-loop. To address these issues, we introduce Sharpness-Aware Trajectory Matching (SATM), which enhances the generalization capability of learned synthetic datasets by optimising the sharpness of the loss landscape and objective simultaneously. Moreover, our approach is coupled with an efficient hypergradient approximation that is mathematically well-supported and straightforward to implement along with controllable computational overhead. Empirical evaluations of SATM demonstrate its effectiveness across various applications, including in-domain benchmarks and out-of-domain settings. Moreover, its easy-to-implement properties afford flexibility, allowing it to integrate with other advanced sharpness-aware minimizers. Our code will be released.

Via

Access Paper or Ask Questions

Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning

Jan 09, 2025

Xiaojie Li, Yibo Yang, Jianlong Wu, David A. Clifton, Yue Yu, Bernard Ghanem, Min Zhang

Figure 1 for Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning

Figure 2 for Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning

Figure 3 for Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning

Figure 4 for Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning

Abstract:Few-shot class-incremental learning (FSCIL) involves learning new classes from limited data while retaining prior knowledge, and often results in catastrophic forgetting. Existing methods either freeze backbone networks to preserve knowledge, which limits adaptability, or rely on additional modules or prompts, introducing inference overhead. To this end, we propose Continuous Knowledge-Preserving Decomposition for FSCIL (CKPD-FSCIL), a framework that decomposes a model's weights into two parts: one that compacts existing knowledge (knowledge-sensitive components) and another that carries redundant capacity to accommodate new abilities (redundant-capacity components). The decomposition is guided by a covariance matrix from replay samples, ensuring principal components align with classification abilities. During adaptation, we freeze the knowledge-sensitive components and only adapt the redundant-capacity components, fostering plasticity while minimizing interference without changing the architecture or increasing overhead. Additionally, CKPD introduces an adaptive layer selection strategy to identify layers with redundant capacity, dynamically allocating adapters. Experiments on multiple benchmarks show that CKPD-FSCIL outperforms state-of-the-art methods.

* Code: https://github.com/xiaojieli0903/CKPD-FSCIL

Via

Access Paper or Ask Questions

Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization

Dec 24, 2024

Sihao Liu, Yibo Yang, Xiaojie Li, David A. Clifton, Bernard Ghanem

Figure 1 for Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization

Figure 2 for Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization

Figure 3 for Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization

Figure 4 for Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization

Abstract:Online continual learning (OCL) seeks to learn new tasks from data streams that appear only once, while retaining knowledge of previously learned tasks. Most existing methods rely on replay, focusing on enhancing memory retention through regularization or distillation. However, they often overlook the adaptability of the model, limiting the ability to learn generalizable and discriminative features incrementally from online training data. To address this, we introduce a plug-and-play module, S6MOD, which can be integrated into most existing methods and directly improve adaptability. Specifically, S6MOD introduces an extra branch after the backbone, where a mixture of discretization selectively adjusts parameters in a selective state space model, enriching selective scan patterns such that the model can adaptively select the most sensitive discretization method for current dynamics. We further design a class-conditional routing algorithm for dynamic, uncertainty-based adjustment and implement a contrastive discretization loss to optimize it. Extensive experiments combining our module with various models demonstrate that S6MOD significantly enhances model adaptability, leading to substantial performance gains and achieving the state-of-the-art results.

Via

Access Paper or Ask Questions

Efficient Task Grouping Through Samplewise Optimisation Landscape Analysis

Dec 05, 2024

Anshul Thakur, Yichen Huang, Soheila Molaei, Yujiang Wang, David A. Clifton

Abstract:Shared training approaches, such as multi-task learning (MTL) and gradient-based meta-learning, are widely used in various machine learning applications, but they often suffer from negative transfer, leading to performance degradation in specific tasks. While several optimisation techniques have been developed to mitigate this issue for pre-selected task cohorts, identifying optimal task combinations for joint learning - known as task grouping - remains underexplored and computationally challenging due to the exponential growth in task combinations and the need for extensive training and evaluation cycles. This paper introduces an efficient task grouping framework designed to reduce these overwhelming computational demands of the existing methods. The proposed framework infers pairwise task similarities through a sample-wise optimisation landscape analysis, eliminating the need for the shared model training required to infer task similarities in existing methods. With task similarities acquired, a graph-based clustering algorithm is employed to pinpoint near-optimal task groups, providing an approximate yet efficient and effective solution to the originally NP-hard problem. Empirical assessments conducted on 8 different datasets highlight the effectiveness of the proposed framework, revealing a five-fold speed enhancement compared to previous state-of-the-art methods. Moreover, the framework consistently demonstrates comparable performance, confirming its remarkable efficiency and effectiveness in task grouping.

* Under review at IEEE Transactions on Pattern Analysis and Machine Intelligence

Via

Access Paper or Ask Questions

FE-Adapter: Adapting Image-based Emotion Classifiers to Videos

Aug 05, 2024

Shreyank N Gowda, Boyan Gao, David A. Clifton

Figure 1 for FE-Adapter: Adapting Image-based Emotion Classifiers to Videos

Figure 2 for FE-Adapter: Adapting Image-based Emotion Classifiers to Videos

Figure 3 for FE-Adapter: Adapting Image-based Emotion Classifiers to Videos

Figure 4 for FE-Adapter: Adapting Image-based Emotion Classifiers to Videos

Abstract:Utilizing large pre-trained models for specific tasks has yielded impressive results. However, fully fine-tuning these increasingly large models is becoming prohibitively resource-intensive. This has led to a focus on more parameter-efficient transfer learning, primarily within the same modality. But this approach has limitations, particularly in video understanding where suitable pre-trained models are less common. Addressing this, our study introduces a novel cross-modality transfer learning approach from images to videos, which we call parameter-efficient image-to-video transfer learning. We present the Facial-Emotion Adapter (FE-Adapter), designed for efficient fine-tuning in video tasks. This adapter allows pre-trained image models, which traditionally lack temporal processing capabilities, to analyze dynamic video content efficiently. Notably, it uses about 15 times fewer parameters than previous methods, while improving accuracy. Our experiments in video emotion recognition demonstrate that the FE-Adapter can match or even surpass existing fine-tuning and video emotion models in both performance and efficiency. This breakthrough highlights the potential for cross-modality approaches in enhancing the capabilities of AI models, particularly in fields like video emotion analysis where the demand for efficiency and accuracy is constantly rising.

Via

Access Paper or Ask Questions

CC-SAM: SAM with Cross-feature Attention and Context for Ultrasound Image Segmentation

Jul 31, 2024

Shreyank N Gowda, David A. Clifton

Abstract:The Segment Anything Model (SAM) has achieved remarkable successes in the realm of natural image segmentation, but its deployment in the medical imaging sphere has encountered challenges. Specifically, the model struggles with medical images that feature low contrast, faint boundaries, intricate morphologies, and small-sized objects. To address these challenges and enhance SAM's performance in the medical domain, we introduce a comprehensive modification. Firstly, we incorporate a frozen Convolutional Neural Network (CNN) branch as an image encoder, which synergizes with SAM's original Vision Transformer (ViT) encoder through a novel variational attention fusion module. This integration bolsters the model's capability to capture local spatial information, which is often paramount in medical imagery. Moreover, to further optimize SAM for medical imaging, we introduce feature and position adapters within the ViT branch, refining the encoder's representations. We see that compared to current prompting strategies to fine-tune SAM for ultrasound medical segmentation, the use of text descriptions that serve as text prompts for SAM helps significantly improve the performance. Leveraging ChatGPT's natural language understanding capabilities, we generate prompts that offer contextual information and guidance to SAM, enabling it to better understand the nuances of ultrasound medical images and improve its segmentation accuracy. Our method, in its entirety, represents a significant stride towards making universal image segmentation models more adaptable and efficient in the medical domain.

* Accepted to ECCV 2024

Via

Access Paper or Ask Questions

Masks and Manuscripts: Advancing Medical Pre-training with End-to-End Masking and Narrative Structuring

Jul 23, 2024

Shreyank N Gowda, David A. Clifton

Abstract:Contemporary medical contrastive learning faces challenges from inconsistent semantics and sample pair morphology, leading to dispersed and converging semantic shifts. The variability in text reports, due to multiple authors, complicates semantic consistency. To tackle these issues, we propose a two-step approach. Initially, text reports are converted into a standardized triplet format, laying the groundwork for our novel concept of ``observations'' and ``verdicts''. This approach refines the {Entity, Position, Exist} triplet into binary questions, guiding towards a clear ``verdict''. We also innovate in visual pre-training with a Meijering-based masking, focusing on features representative of medical images' local context. By integrating this with our text conversion method, our model advances cross-modal representation in a multimodal contrastive learning framework, setting new benchmarks in medical image analysis.

* Accepted in MICCAI-24

Via

Access Paper or Ask Questions

Rapid Biomedical Research Classification: The Pandemic PACT Advanced Categorisation Engine

Jul 14, 2024

Omid Rohanian, Mohammadmahdi Nouriborji, Olena Seminog, Rodrigo Furst, Thomas Mendy, Shanthi Levanita, Zaharat Kadri-Alab, Nusrat Jabin, Daniela Toale, Georgina Humphreys(+4 more)

Figure 1 for Rapid Biomedical Research Classification: The Pandemic PACT Advanced Categorisation Engine

Figure 2 for Rapid Biomedical Research Classification: The Pandemic PACT Advanced Categorisation Engine

Figure 3 for Rapid Biomedical Research Classification: The Pandemic PACT Advanced Categorisation Engine

Figure 4 for Rapid Biomedical Research Classification: The Pandemic PACT Advanced Categorisation Engine

Abstract:This paper introduces the Pandemic PACT Advanced Categorisation Engine (PPACE) along with its associated dataset. PPACE is a fine-tuned model developed to automatically classify research abstracts from funded biomedical projects according to WHO-aligned research priorities. This task is crucial for monitoring research trends and identifying gaps in global health preparedness and response. Our approach builds on human-annotated projects, which are allocated one or more categories from a predefined list. A large language model is then used to generate `rationales' explaining the reasoning behind these annotations. This augmented data, comprising expert annotations and rationales, is subsequently used to fine-tune a smaller, more efficient model. Developed as part of the Pandemic PACT project, which aims to track and analyse research funding and clinical evidence for a wide range of diseases with outbreak potential, PPACE supports informed decision-making by research funders, policymakers, and independent researchers. We introduce and release both the trained model and the instruction-based dataset used for its training. Our evaluation shows that PPACE significantly outperforms its baselines. The release of PPACE and its associated dataset offers valuable resources for researchers in multilabel biomedical document classification and supports advancements in aligning biomedical research with key global health priorities.

Via

Access Paper or Ask Questions

SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

Jul 05, 2024

Xingrun Xing, Boyan Gao, Zheng Zhang, David A. Clifton, Shitao Xiao, Li Du, Guoqi Li, Jiajun Zhang

Figure 1 for SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

Figure 2 for SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

Figure 3 for SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

Figure 4 for SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

Abstract:The recent advancements in large language models (LLMs) with billions of parameters have significantly boosted their performance across various real-world applications. However, the inference processes for these models require substantial energy and computational resources, presenting considerable deployment challenges. In contrast, human brains, which contain approximately 86 billion biological neurons, exhibit significantly greater energy efficiency compared to LLMs with a similar number of parameters. Inspired by this, we redesign 7 to 70 billion parameter LLMs using bio-plausible spiking mechanisms, emulating the efficient behavior of the human brain. We propose the first spiking large language model as recent LLMs termed SpikeLLM. Coupled with the proposed model, a novel spike-driven quantization framework named Optimal Brain Spiking is introduced to reduce the energy cost and accelerate inference speed via two essential approaches: first (second)-order differentiation-based salient channel detection, and per-channel salient outlier expansion with Generalized Integrate-and-Fire neurons. Our proposed spike-driven quantization can plug in main streams of quantization training methods. In the OmniQuant pipeline, SpikeLLM significantly reduces 25.51% WikiText2 perplexity and improves 3.08% average accuracy of 6 zero-shot datasets on a LLAMA2-7B 4A4W model. In the GPTQ pipeline, SpikeLLM realizes a sparse ternary quantization, which achieves additive in all linear layers. Compared with PB-LLM with similar operations, SpikeLLM also exceeds significantly. We will release our code on GitHub.

Via

Access Paper or Ask Questions

Computation-Efficient Semi-Supervised Learning for ECG-based Cardiovascular Diseases Detection

Jun 20, 2024

Rushuang Zhou, Zijun Liu, Lei Clifton, David A. Clifton, Kannie W. Y. Chan, Yuan-Ting Zhang, Yining Dong

Abstract:Label scarcity problem is the main challenge that hinders the wide application of deep learning systems in automatic cardiovascular diseases (CVDs) detection using electrocardiography (ECG). Tuning pre-trained models alleviates this problem by transferring knowledge learned from large datasets to downstream small datasets. However, bottlenecks in computational efficiency and CVDs detection performance limit its clinical applications. It is difficult to improve the detection performance without significantly sacrificing model computational efficiency. Here, we propose a computation-efficient semi-supervised learning paradigm (FastECG) for robust and computation-efficient CVDs detection using ECG. It enables a robust adaptation of pre-trained models on downstream datasets with limited supervision and high computational efficiency. First, a random-deactivation technique is developed to achieve robust and fast low-rank adaptation of pre-trained weights. Subsequently, we propose a one-shot rank allocation module to determine the optimal ranks for the update matrices of the pre-trained weights. Finally, a lightweight semi-supervised learning pipeline is introduced to enhance model performance by leveraging labeled and unlabeled data with high computational efficiency. Extensive experiments on four downstream ECG datasets demonstrate that FastECG not only outperforms the state-of-the-art methods in multi-label CVDs detection but also consumes fewer GPU footprints, training time, and parameter storage space. As such, this paradigm provides an effective solution for achieving high computational efficiency and robust detection performance in the clinical applications of pre-trained models under limited supervision.

Via

Access Paper or Ask Questions