Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tanny Chavez

Unlocking Latent Dimensions: Exploring Representations of Large-Scale X-ray Scattering Data using Variational Autoencoders

Jun 12, 2026

Monika Choudhary, Xiaoya Chong, Runbo Jiang, Wiebke Koepp, Petrus H. Zwart, Damon English, Gregory M. Su, Eric Schaible, Chenhui Zhu, Mostafa Nassr(+18 more)

Abstract:Scientific user facilities generate X-ray scattering data faster than traditional workflows can process them. We address this challenge across two settings, offline dataset exploration and live on-the-fly analysis. We train a domain-specific attention-based Convolutional Variational Autoencoder (C-VAE) on 1.5 million X-ray scattering images to learn low-dimensional representations capturing structural variation across diverse experimental conditions. The learned latent space reveals well-organized clusters and smooth trajectories reflecting experimental progression. It further supports controlled synthetic scattering image generation across diverse structural states. When deployed without retraining, the model organizes time-resolved film formation experiments at two synchrotron facilities into interpretable latent structures. Benchmarking against DINOv3 (ViT-7B), a general-purpose vision foundation model, demonstrates that domain-specific training yields more interpretable latent organization for scattering data. Both workflows are integrated within Latent Space Explorer, a component of the MLExchange platform, supporting interactive structural exploration across archived datasets and live experiments.

Via

Access Paper or Ask Questions

Generating Realistic X-ray Scattering Images Using Stable Diffusion and Human-in-the-loop Annotations

Aug 22, 2024

Zhuowen Zhao, Xiaoya Chong, Tanny Chavez, Alexander Hexemer

Abstract:We fine-tuned a foundational stable diffusion model using X-ray scattering images and their corresponding descriptions to generate new scientific images from given prompts. However, some of the generated images exhibit significant unrealistic artifacts, commonly known as "hallucinations". To address this issue, we trained various computer vision models on a dataset composed of 60% human-approved generated images and 40% experimental images to detect unrealistic images. The classified images were then reviewed and corrected by human experts, and subsequently used to further refine the classifiers in next rounds of training and inference. Our evaluations demonstrate the feasibility of generating high-fidelity, domain-specific images using a fine-tuned diffusion model. We anticipate that generative AI will play a crucial role in enhancing data augmentation and driving the development of digital twins in scientific research facilities.

Via

Access Paper or Ask Questions

DLSIA: Deep Learning for Scientific Image Analysis

Aug 02, 2023

Eric J Roberts, Tanny Chavez, Alexander Hexemer, Petrus H. Zwart

Figure 1 for DLSIA: Deep Learning for Scientific Image Analysis

Figure 2 for DLSIA: Deep Learning for Scientific Image Analysis

Figure 3 for DLSIA: Deep Learning for Scientific Image Analysis

Figure 4 for DLSIA: Deep Learning for Scientific Image Analysis

Abstract:We introduce DLSIA (Deep Learning for Scientific Image Analysis), a Python-based machine learning library that empowers scientists and researchers across diverse scientific domains with a range of customizable convolutional neural network (CNN) architectures for a wide variety of tasks in image analysis to be used in downstream data processing, or for experiment-in-the-loop computing scenarios. DLSIA features easy-to-use architectures such as autoencoders, tunable U-Nets, and parameter-lean mixed-scale dense networks (MSDNets). Additionally, we introduce sparse mixed-scale networks (SMSNets), generated using random graphs and sparse connections. As experimental data continues to grow in scale and complexity, DLSIA provides accessible CNN construction and abstracts CNN complexities, allowing scientists to tailor their machine learning approaches, accelerate discoveries, foster interdisciplinary collaboration, and advance research in scientific image analysis.

* 10 pages, two column, 9 figures, 1 Supplementary section, IEEE conference

Via

Access Paper or Ask Questions

MLExchange: A web-based platform enabling exchangeable machine learning workflows

Aug 23, 2022

Zhuowen Zhao, Tanny Chavez, Elizabeth Holman, Guanhua Hao, Adam Green, Harinarayan Krishnan, Dylan McReynolds, Ronald Pandolfi, Eric J. Roberts, Petrus H. Zwart(+7 more)

Figure 1 for MLExchange: A web-based platform enabling exchangeable machine learning workflows

Figure 2 for MLExchange: A web-based platform enabling exchangeable machine learning workflows

Figure 3 for MLExchange: A web-based platform enabling exchangeable machine learning workflows

Figure 4 for MLExchange: A web-based platform enabling exchangeable machine learning workflows

Abstract:Machine learning (ML) algorithms are showing a growing trend in helping the scientific communities across different disciplines and institutions to address large and diverse data problems. However, many available ML tools are programmatically demanding and computationally costly. The MLExchange project aims to build a collaborative platform equipped with enabling tools that allow scientists and facility users who do not have a profound ML background to use ML and computational resources in scientific discovery. At the high level, we are targeting a full user experience where managing and exchanging ML algorithms, workflows, and data are readily available through web applications. So far, we have built four major components, i.e, the central job manager, the centralized content registry, user portal, and search engine, and successfully deployed these components on a testing server. Since each component is an independent container, the whole platform or its individual service(s) can be easily deployed at servers of different scales, ranging from a laptop (usually a single user) to high performance clusters (HPC) accessed (simultaneously) by many users. Thus, MLExchange renders flexible using scenarios -- users could either access the services and resources from a remote server or run the whole platform or its individual service(s) within their local network.

* Submitting to The Int'l Conference for High Performance Computing, Networking, Storage, and Analysis; revised the title

Via

Access Paper or Ask Questions