Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Joonas Nättilä

aweSOM: a CPU/GPU-accelerated Self-organizing Map and Statistically Combined Ensemble Framework for Machine-learning Clustering Analysis

Apr 13, 2025

Trung Ha, Joonas Nättilä, Jordy Davelaar

Abstract:We introduce aweSOM, an open-source Python package for machine learning (ML) clustering and classification, using a Self-organizing Maps (SOM) algorithm that incorporates CPU/GPU acceleration to accommodate large ($N > 10^6$, where $N$ is the number of data points), multidimensional datasets. aweSOM consists of two main modules, one that handles the initialization and training of the SOM, and another that stacks the results of multiple SOM realizations to obtain more statistically robust clusters. Existing Python-based SOM implementations (e.g., POPSOM, Yuan (2018); MiniSom, Vettigli (2018); sklearn-som) primarily serve as proof-of-concept demonstrations, optimized for smaller datasets, but lacking scalability for large, multidimensional data. aweSOM provides a solution for this gap in capability, with good performance scaling up to $\sim 10^8$ individual points, and capable of utilizing multiple features per point. We compare the code performance against the legacy implementations it is based on, and find a 10-100x speed up, as well as significantly improved memory efficiency, due to several built-in optimizations.

* Published in the Journal of Open Source Software; method paper for arXiv: 2410.01878

Via

Access Paper or Ask Questions

Segmentation of turbulent computational fluid dynamics simulations with unsupervised ensemble learning

Sep 03, 2021

Maarja Bussov, Joonas Nättilä

Figure 1 for Segmentation of turbulent computational fluid dynamics simulations with unsupervised ensemble learning

Figure 2 for Segmentation of turbulent computational fluid dynamics simulations with unsupervised ensemble learning

Figure 3 for Segmentation of turbulent computational fluid dynamics simulations with unsupervised ensemble learning

Figure 4 for Segmentation of turbulent computational fluid dynamics simulations with unsupervised ensemble learning

Abstract:Computer vision and machine learning tools offer an exciting new way for automatically analyzing and categorizing information from complex computer simulations. Here we design an ensemble machine learning framework that can independently and robustly categorize and dissect simulation data output contents of turbulent flow patterns into distinct structure catalogues. The segmentation is performed using an unsupervised clustering algorithm, which segments physical structures by grouping together similar pixels in simulation images. The accuracy and robustness of the resulting segment region boundaries are enhanced by combining information from multiple simultaneously-evaluated clustering operations. The stacking of object segmentation evaluations is performed using image mask combination operations. This statistically-combined ensemble (SCE) of different cluster masks allows us to construct cluster reliability metrics for each pixel and for the associated segments without any prior user input. By comparing the similarity of different cluster occurrences in the ensemble, we can also assess the optimal number of clusters needed to describe the data. Furthermore, by relying on ensemble-averaged spatial segment region boundaries, the SCE method enables reconstruction of more accurate and robust region of interest (ROI) boundaries for the different image data clusters. We apply the SCE algorithm to 2-dimensional simulation data snapshots of magnetically-dominated fully-kinetic turbulent plasma flows where accurate ROI boundaries are needed for geometrical measurements of intermittent flow structures known as current sheets.

* 15 pages, 8 figures. Accepted to Signal Processing: Image Communication. Code available from a repository: https://github.com/mkruuse/segmenting-turbulent-simulations-with-ensemble-learning

Via

Access Paper or Ask Questions

Exploring helical dynamos with machine learning

May 24, 2019

Farrukh Nauman, Joonas Nättilä

Figure 1 for Exploring helical dynamos with machine learning

Figure 2 for Exploring helical dynamos with machine learning

Figure 3 for Exploring helical dynamos with machine learning

Figure 4 for Exploring helical dynamos with machine learning

Abstract:We use ensemble machine learning algorithms to study the evolution of magnetic fields in magnetohydrodynamic (MHD) turbulence that is helically forced. Using mean field formalism, we model the electromotive force (EMF) both as a linear and non-linear function of the mean magnetic field and current density. The form of the EMF is determined using regularized linear regression and random forests. We also compare various analytical models to the data using Bayesian inference with Markov Chain Monte Carlo (MCMC) sampling. Our results demonstrate that linear regression is largely successful at predicting the EMF and the use of more sophisticated algorithms (random forests, MCMC) do not lead to significant improvement in the fits. We conclude that the data we are looking at is effectively low dimensional and essentially linear. Finally, to encourage further exploration by the community, we provide all of our simulation data and analysis scripts as open source \textsc{IPython} notebooks.

* formatting changes/typo corr., 10 pages, 6 figures, 3 tables, comments welcome, data + IPython notebooks: https://github.com/fnauman/ML_alpha2

Via

Access Paper or Ask Questions