Computer vision and machine learning tools offer an exciting new way for automatically analyzing and categorizing information from complex computer simulations. Here we design an ensemble machine learning framework that can independently and robustly categorize and dissect simulation data output contents of turbulent flow patterns into distinct structure catalogues. The segmentation is performed using an unsupervised clustering algorithm, which segments physical structures by grouping together similar pixels in simulation images. The accuracy and robustness of the resulting segment region boundaries are enhanced by combining information from multiple simultaneously-evaluated clustering operations. The stacking of object segmentation evaluations is performed using image mask combination operations. This statistically-combined ensemble (SCE) of different cluster masks allows us to construct cluster reliability metrics for each pixel and for the associated segments without any prior user input. By comparing the similarity of different cluster occurrences in the ensemble, we can also assess the optimal number of clusters needed to describe the data. Furthermore, by relying on ensemble-averaged spatial segment region boundaries, the SCE method enables reconstruction of more accurate and robust region of interest (ROI) boundaries for the different image data clusters. We apply the SCE algorithm to 2-dimensional simulation data snapshots of magnetically-dominated fully-kinetic turbulent plasma flows where accurate ROI boundaries are needed for geometrical measurements of intermittent flow structures known as current sheets.
We use ensemble machine learning algorithms to study the evolution of magnetic fields in magnetohydrodynamic (MHD) turbulence that is helically forced. Using mean field formalism, we model the electromotive force (EMF) both as a linear and non-linear function of the mean magnetic field and current density. The form of the EMF is determined using regularized linear regression and random forests. We also compare various analytical models to the data using Bayesian inference with Markov Chain Monte Carlo (MCMC) sampling. Our results demonstrate that linear regression is largely successful at predicting the EMF and the use of more sophisticated algorithms (random forests, MCMC) do not lead to significant improvement in the fits. We conclude that the data we are looking at is effectively low dimensional and essentially linear. Finally, to encourage further exploration by the community, we provide all of our simulation data and analysis scripts as open source \textsc{IPython} notebooks.