Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

William Andrew Simon

Enhancing Downstream Analysis in Genome Sequencing: Species Classification While Basecalling

Apr 09, 2025

Riselda Kodra, Hadjer Benmeziane, Irem Boybat, William Andrew Simon

Abstract:The ability to quickly and accurately identify microbial species in a sample, known as metagenomic profiling, is critical across various fields, from healthcare to environmental science. This paper introduces a novel method to profile signals coming from sequencing devices in parallel with determining their nucleotide sequences, a process known as basecalling, via a multi-objective deep neural network for simultaneous basecalling and multi-class genome classification. We introduce a new loss strategy where losses for basecalling and classification are back-propagated separately, with model weights combined for the shared layers, and a pre-configured ranking strategy allowing top-K species accuracy, giving users flexibility to choose between higher accuracy or higher speed at identifying the species. We achieve state-of-the-art basecalling accuracies, while classification accuracies meet and exceed the results of state-of-the-art binary classifiers, attaining an average of 92.5%/98.9% accuracy at identifying the top-1/3 species among a total of 17 genomes in the Wick bacterial dataset. The work presented here has implications for future studies in metagenomic profiling by accelerating the bottleneck step of matching the DNA sequence to the correct genome.

* Accepted as Tiny Paper at MLGenX workshop, ICLR, 2025

Via

Access Paper or Ask Questions

HDTorch: Accelerating Hyperdimensional Computing with GP-GPUs for Design Space Exploration

Jun 09, 2022

William Andrew Simon, Una Pale, Tomas Teijeiro, David Atienza

Figure 1 for HDTorch: Accelerating Hyperdimensional Computing with GP-GPUs for Design Space Exploration

Figure 2 for HDTorch: Accelerating Hyperdimensional Computing with GP-GPUs for Design Space Exploration

Figure 3 for HDTorch: Accelerating Hyperdimensional Computing with GP-GPUs for Design Space Exploration

Figure 4 for HDTorch: Accelerating Hyperdimensional Computing with GP-GPUs for Design Space Exploration

Abstract:HyperDimensional Computing (HDC) as a machine learning paradigm is highly interesting for applications involving continuous, semi-supervised learning for long-term monitoring. However, its accuracy is not yet on par with other Machine Learning (ML) approaches. Frameworks enabling fast design space exploration to find practical algorithms are necessary to make HD computing competitive with other ML techniques. To this end, we introduce HDTorch, an open-source, PyTorch-based HDC library with CUDA extensions for hypervector operations. We demonstrate HDTorch's utility by analyzing four HDC benchmark datasets in terms of accuracy, runtime, and memory consumption, utilizing both classical and online HD training methodologies. We demonstrate average (training)/inference speedups of (111x/68x)/87x for classical/online HD, respectively. Moreover, we analyze the effects of varying hyperparameters on runtime and accuracy. Finally, we demonstrate how HDTorch enables exploration of HDC strategies applied to large, real-world datasets. We perform the first-ever HD training and inference analysis of the entirety of the CHB-MIT EEG epilepsy database. Results show that the typical approach of training on a subset of the data does not necessarily generalize to the entire dataset, an important factor when developing future HD models for medical wearable devices.

* Submitted to the ICCAD 2022 conference (23.5.2022.)

Via

Access Paper or Ask Questions