Alert button
Picture for Graham W. Taylor

Graham W. Taylor

Alert button

BarcodeBERT: Transformers for Biodiversity Analysis

Nov 04, 2023
Pablo Millan Arias, Niousha Sadjadi, Monireh Safari, ZeMing Gong, Austin T. Wang, Scott C. Lowe, Joakim Bruslund Haurum, Iuliia Zarubiieva, Dirk Steinke, Lila Kari, Angel X. Chang, Graham W. Taylor

Understanding biodiversity is a global challenge, in which DNA barcodes - short snippets of DNA that cluster by species - play a pivotal role. In particular, invertebrates, a highly diverse and under-explored group, pose unique taxonomic complexities. We explore machine learning approaches, comparing supervised CNNs, fine-tuned foundation models, and a DNA barcode-specific masking strategy across datasets of varying complexity. While simpler datasets and tasks favor supervised CNNs or fine-tuned transformers, challenging species-level identification demands a paradigm shift towards self-supervised pretraining. We propose BarcodeBERT, the first self-supervised method for general biodiversity analysis, leveraging a 1.5 M invertebrate DNA barcode reference library. This work highlights how dataset specifics and coverage impact model selection, and underscores the role of self-supervised pretraining in achieving high-accuracy DNA barcode-based identification at the species and genus level. Indeed, without the fine-tuning step, BarcodeBERT pretrained on a large DNA barcode dataset outperforms DNABERT and DNABERT-2 on multiple downstream classification tasks. The code repository is available at https://github.com/Kari-Genomics-Lab/BarcodeBERT

* Main text: 5 pages, Total: 9 pages, 2 figures, accepted at the 4th Workshop on Self-Supervised Learning: Theory and Practice (NeurIPS 2023) 
Viaarxiv icon

Bandit-Driven Batch Selection for Robust Learning under Label Noise

Oct 31, 2023
Michal Lisicki, Mihai Nica, Graham W. Taylor

We introduce a novel approach for batch selection in Stochastic Gradient Descent (SGD) training, leveraging combinatorial bandit algorithms. Our methodology focuses on optimizing the learning process in the presence of label noise, a prevalent issue in real-world datasets. Experimental evaluations on the CIFAR-10 dataset reveal that our approach consistently outperforms existing methods across various levels of label corruption. Importantly, we achieve this superior performance without incurring the computational overhead commonly associated with auxiliary neural network models. This work presents a balanced trade-off between computational efficiency and model efficacy, offering a scalable solution for complex machine learning applications.

* WANT@NeurIPS 2023 & OPT@NeurIPS 2023 
Viaarxiv icon

Which Tokens to Use? Investigating Token Reduction in Vision Transformers

Aug 09, 2023
Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor, Thomas B. Moeslund

Figure 1 for Which Tokens to Use? Investigating Token Reduction in Vision Transformers
Figure 2 for Which Tokens to Use? Investigating Token Reduction in Vision Transformers
Figure 3 for Which Tokens to Use? Investigating Token Reduction in Vision Transformers
Figure 4 for Which Tokens to Use? Investigating Token Reduction in Vision Transformers

Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs more efficient by removing redundant information in the processed tokens. While different methods have been explored to achieve this goal, we still lack understanding of the resulting reduction patterns and how those patterns differ across token reduction methods and datasets. To close this gap, we set out to understand the reduction patterns of 10 different token reduction methods using four image classification datasets. By systematically comparing these methods on the different classification tasks, we find that the Top-K pruning method is a surprisingly strong baseline. Through in-depth analysis of the different methods, we determine that: the reduction patterns are generally not consistent when varying the capacity of the backbone model, the reduction patterns of pruning-based methods significantly differ from fixed radial patterns, and the reduction patterns of pruning-based methods are correlated across classification datasets. Finally we report that the similarity of reduction patterns is a moderate-to-strong proxy for model performance. Project page at https://vap.aau.dk/tokens.

* ICCV 2023 NIVT Workshop. Project webpage https://vap.aau.dk/tokens 
Viaarxiv icon

A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect Dataset

Jul 19, 2023
Zahra Gharaee, ZeMing Gong, Nicholas Pellegrino, Iuliia Zarubiieva, Joakim Bruslund Haurum, Scott C. Lowe, Jaclyn T. A. McKeown, Chris C. Y. Ho, Joschka McLeod, Yi-Yun C Wei, Jireh Agda, Sujeevan Ratnasingham, Dirk Steinke, Angel X. Chang, Graham W. Taylor, Paul Fieguth

Figure 1 for A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect Dataset
Figure 2 for A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect Dataset
Figure 3 for A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect Dataset
Figure 4 for A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect Dataset

In an effort to catalog insect biodiversity, we propose a new large dataset of hand-labelled insect images, the BIOSCAN-Insect Dataset. Each record is taxonomically classified by an expert, and also has associated genetic information including raw nucleotide barcode sequences and assigned barcode index numbers, which are genetically-based proxies for species classification. This paper presents a curated million-image dataset, primarily to train computer-vision models capable of providing image-based taxonomic assessment, however, the dataset also presents compelling characteristics, the study of which would be of interest to the broader machine learning community. Driven by the biological nature inherent to the dataset, a characteristic long-tailed class-imbalance distribution is exhibited. Furthermore, taxonomic labelling is a hierarchical classification scheme, presenting a highly fine-grained classification problem at lower levels. Beyond spurring interest in biodiversity research within the machine learning community, progress on creating an image-based taxonomic classifier will also further the ultimate goal of all BIOSCAN research: to lay the foundation for a comprehensive survey of global biodiversity. This paper introduces the dataset and explores the classification task through the implementation and analysis of a baseline classifier.

Viaarxiv icon

Empirically Validating Conformal Prediction on Modern Vision Architectures Under Distribution Shift and Long-tailed Data

Jul 03, 2023
Kevin Kasa, Graham W. Taylor

Figure 1 for Empirically Validating Conformal Prediction on Modern Vision Architectures Under Distribution Shift and Long-tailed Data
Figure 2 for Empirically Validating Conformal Prediction on Modern Vision Architectures Under Distribution Shift and Long-tailed Data
Figure 3 for Empirically Validating Conformal Prediction on Modern Vision Architectures Under Distribution Shift and Long-tailed Data
Figure 4 for Empirically Validating Conformal Prediction on Modern Vision Architectures Under Distribution Shift and Long-tailed Data

Conformal prediction has emerged as a rigorous means of providing deep learning models with reliable uncertainty estimates and safety guarantees. Yet, its performance is known to degrade under distribution shift and long-tailed class distributions, which are often present in real world applications. Here, we characterize the performance of several post-hoc and training-based conformal prediction methods under these settings, providing the first empirical evaluation on large-scale datasets and models. We show that across numerous conformal methods and neural network families, performance greatly degrades under distribution shifts violating safety guarantees. Similarly, we show that in long-tailed settings the guarantees are frequently violated on many classes. Understanding the limitations of these methods is necessary for deployment in real world and safety-critical applications.

Viaarxiv icon

Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers

Mar 24, 2023
Cong Wei, Brendan Duke, Ruowei Jiang, Parham Aarabi, Graham W. Taylor, Florian Shkurti

Figure 1 for Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
Figure 2 for Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
Figure 3 for Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
Figure 4 for Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers

Vision Transformers (ViT) have shown their competitive advantages performance-wise compared to convolutional neural networks (CNNs) though they often come with high computational costs. To this end, previous methods explore different attention patterns by limiting a fixed number of spatially nearby tokens to accelerate the ViT's multi-head self-attention (MHSA) operations. However, such structured attention patterns limit the token-to-token connections to their spatial relevance, which disregards learned semantic connections from a full attention mask. In this work, we propose a novel approach to learn instance-dependent attention patterns, by devising a lightweight connectivity predictor module to estimate the connectivity score of each pair of tokens. Intuitively, two tokens have high connectivity scores if the features are considered relevant either spatially or semantically. As each token only attends to a small number of other tokens, the binarized connectivity masks are often very sparse by nature and therefore provide the opportunity to accelerate the network via sparse computations. Equipped with the learned unstructured attention pattern, sparse attention ViT (Sparsifiner) produces a superior Pareto-optimal trade-off between FLOPs and top-1 accuracy on ImageNet compared to token sparsity. Our method reduces 48% to 69% FLOPs of MHSA while the accuracy drop is within 0.4%. We also show that combining attention and token sparsity reduces ViT FLOPs by over 60%.

* Accepted at CVPR 2023 
Viaarxiv icon

GCNet: Probing Self-Similarity Learning for Generalized Counting Network

Feb 10, 2023
Mingjie Wang, Yande Li, Jun Zhou, Graham W. Taylor, Minglun Gong

Figure 1 for GCNet: Probing Self-Similarity Learning for Generalized Counting Network
Figure 2 for GCNet: Probing Self-Similarity Learning for Generalized Counting Network
Figure 3 for GCNet: Probing Self-Similarity Learning for Generalized Counting Network
Figure 4 for GCNet: Probing Self-Similarity Learning for Generalized Counting Network

The class-agnostic counting (CAC) problem has caught increasing attention recently due to its wide societal applications and arduous challenges. To count objects of different categories, existing approaches rely on user-provided exemplars, which is hard-to-obtain and limits their generality. In this paper, we aim to empower the framework to recognize adaptive exemplars within the whole images. A zero-shot Generalized Counting Network (GCNet) is developed, which uses a pseudo-Siamese structure to automatically and effectively learn pseudo exemplar clues from inherent repetition patterns. In addition, a weakly-supervised scheme is presented to reduce the burden of laborious density maps required by all contemporary CAC models, allowing GCNet to be trained using count-level supervisory signals in an end-to-end manner. Without providing any spatial location hints, GCNet is capable of adaptively capturing them through a carefully-designed self-similarity learning strategy. Extensive experiments and ablation studies on the prevailing benchmark FSC147 for zero-shot CAC demonstrate the superiority of our GCNet. It performs on par with existing exemplar-dependent methods and shows stunning cross-dataset generality on crowd-specific datasets, e.g., ShanghaiTech Part A, Part B and UCF_QNRF.

Viaarxiv icon

Quantum HyperNetworks: Training Binary Neural Networks in Quantum Superposition

Jan 19, 2023
Juan Carrasquilla, Mohamed Hibat-Allah, Estelle Inack, Alireza Makhzani, Kirill Neklyudov, Graham W. Taylor, Giacomo Torlai

Figure 1 for Quantum HyperNetworks: Training Binary Neural Networks in Quantum Superposition
Figure 2 for Quantum HyperNetworks: Training Binary Neural Networks in Quantum Superposition
Figure 3 for Quantum HyperNetworks: Training Binary Neural Networks in Quantum Superposition
Figure 4 for Quantum HyperNetworks: Training Binary Neural Networks in Quantum Superposition

Binary neural networks, i.e., neural networks whose parameters and activations are constrained to only two possible values, offer a compelling avenue for the deployment of deep learning models on energy- and memory-limited devices. However, their training, architectural design, and hyperparameter tuning remain challenging as these involve multiple computationally expensive combinatorial optimization problems. Here we introduce quantum hypernetworks as a mechanism to train binary neural networks on quantum computers, which unify the search over parameters, hyperparameters, and architectures in a single optimization loop. Through classical simulations, we demonstrate that of our approach effectively finds optimal parameters, hyperparameters and architectural choices with high probability on classification problems including a two-dimensional Gaussian dataset and a scaled-down version of the MNIST handwritten digits. We represent our quantum hypernetworks as variational quantum circuits, and find that an optimal circuit depth maximizes the probability of finding performant binary neural networks. Our unified approach provides an immense scope for other applications in the field of machine learning.

* 10 pages, 6 figures. Minimal implementation: https://github.com/carrasqu/binncode 
Viaarxiv icon

Bounding generalization error with input compression: An empirical study with infinite-width networks

Jul 19, 2022
Angus Galloway, Anna Golubeva, Mahmoud Salem, Mihai Nica, Yani Ioannou, Graham W. Taylor

Figure 1 for Bounding generalization error with input compression: An empirical study with infinite-width networks
Figure 2 for Bounding generalization error with input compression: An empirical study with infinite-width networks
Figure 3 for Bounding generalization error with input compression: An empirical study with infinite-width networks
Figure 4 for Bounding generalization error with input compression: An empirical study with infinite-width networks

Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) is an important task that often relies on availability of held-out data. The ability to better predict GE based on a single training set may yield overarching DNN design principles to reduce a reliance on trial-and-error, along with other performance assessment advantages. In search of a quantity relevant to GE, we investigate the Mutual Information (MI) between the input and final layer representations, using the infinite-width DNN limit to bound MI. An existing input compression-based GE bound is used to link MI and GE. To the best of our knowledge, this represents the first empirical study of this bound. In our attempt to empirically falsify the theoretical bound, we find that it is often tight for best-performing models. Furthermore, it detects randomization of training labels in many cases, reflects test-time perturbation robustness, and works well given only few training samples. These results are promising given that input compression is broadly applicable where MI can be estimated with confidence.

* 12 pages main content, 26 pages total 
Viaarxiv icon

Monitoring Shortcut Learning using Mutual Information

Jun 27, 2022
Mohammed Adnan, Yani Ioannou, Chuan-Yung Tsai, Angus Galloway, H. R. Tizhoosh, Graham W. Taylor

Figure 1 for Monitoring Shortcut Learning using Mutual Information
Figure 2 for Monitoring Shortcut Learning using Mutual Information
Figure 3 for Monitoring Shortcut Learning using Mutual Information
Figure 4 for Monitoring Shortcut Learning using Mutual Information

The failure of deep neural networks to generalize to out-of-distribution data is a well-known problem and raises concerns about the deployment of trained networks in safety-critical domains such as healthcare, finance and autonomous vehicles. We study a particular kind of distribution shift $\unicode{x2013}$ shortcuts or spurious correlations in the training data. Shortcut learning is often only exposed when models are evaluated on real-world data that does not contain the same spurious correlations, posing a serious dilemma for AI practitioners to properly assess the effectiveness of a trained model for real-world applications. In this work, we propose to use the mutual information (MI) between the learned representation and the input as a metric to find where in training, the network latches onto shortcuts. Experiments demonstrate that MI can be used as a domain-agnostic metric for monitoring shortcut learning.

* Accepted at ICML 2022 Workshop on Spurious Correlations, Invariance, and Stability 
Viaarxiv icon