Abstract:Reproducibility of computational results remains a challenge in materials science, as simulation workflows and parameters are often reported only in unstructured text and tables. While literature data are valuable for validation and reuse, the lack of machine-readable workflow descriptions prevents large-scale curation and systematic comparison. Existing text-mining approaches are insufficient to extract complete computational workflows with their associated parameters. An ontology-driven, large language model (LLM)-assisted framework is introduced for the automated extraction and structuring of computational workflows from the literature. The approach focuses on density functional theory-based stacking fault energy (SFE) calculations in hexagonal close-packed magnesium and its binary alloys, and uses a multi-stage filtering strategy together with prompt-engineered LLM extraction applied to method sections and tables. Extracted information is unified into a canonical schema and aligned with established materials ontologies (CMSO, ASMO, and PLDO), enabling the construction of a knowledge graph using atomRDF. The resulting knowledge graph enables systematic comparison of reported SFE values and supports the structured reuse of computational protocols. While full computational reproducibility is still constrained by missing or implicit metadata, the framework provides a foundation for organizing and contextualizing published results in a semantically interoperable form, thereby improving transparency and reusability of computational materials data.
Abstract:Partial differential equations (PDEs) are a central tool for modeling the dynamics of physical, engineering, and materials systems, but high-fidelity simulations are often computationally expensive. At the same time, many scientific applications can be viewed as the evolution of spatially distributed fields, making data-driven forecasting of such fields a core task in scientific machine learning. In this work we study autoregressive deep-learning surrogates for two-dimensional PDE dynamics on periodic domains, focusing on generalization to out-of-distribution initial conditions within a fixed PDE and parameter regime and on strict small-data settings with at most $\mathcal{O}(10^2)$ simulated trajectories per system. We introduce a multi-channel U-Net [...], evaluate it on five qualitatively different PDE families and compare it to ViT, AFNO, PDE-Transformer, and KAN-UNet under a common training setup. Across all datasets, me-UNet matches or outperforms these more complex architectures in terms of field-space error, spectral similarity, and physics-based metrics for in-distribution rollouts, while requiring substantially less training time. It also generalizes qualitatively to unseen initial conditions with as few as $\approx 20$ training simulations. A data-efficiency study and Grad-CAM analysis further suggest that, in small-data periodic 2D PDE settings, convolutional architectures with inductive biases aligned to locality and periodic boundary conditions remain strong contenders for accurate and moderately out-of-distribution-robust surrogate modeling.




Abstract:In this work, we explore the potential of self-supervised learning from unlabeled electron microscopy datasets, taking a step toward building a foundation model in this field. We show how self-supervised pretraining facilitates efficient fine-tuning for a spectrum of downstream tasks, including semantic segmentation, denoising, noise & background removal, and super-resolution. Experimentation with varying model complexities and receptive field sizes reveals the remarkable phenomenon that fine-tuned models of lower complexity consistently outperform more complex models with random weight initialization. We demonstrate the versatility of self-supervised pretraining across various downstream tasks in the context of electron microscopy, allowing faster convergence and better performance. We conclude that self-supervised pretraining serves as a powerful catalyst, being especially advantageous when limited annotated data are available and efficient scaling of computational cost are important.




Abstract:Detecting and analyzing various defect types in semiconductor materials is an important prerequisite for understanding the underlying mechanisms as well as tailoring the production processes. Analysis of microscopy images that reveal defects typically requires image analysis tasks such as segmentation and object detection. With the permanently increasing amount of data that is produced by experiments, handling these tasks manually becomes more and more impossible. In this work, we combine various image analysis and data mining techniques for creating a robust and accurate, automated image analysis pipeline. This allows for extracting the type and position of all defects in a microscopy image of a KOH-etched 4H-SiC wafer that was stitched together from approximately 40,000 individual images.




Abstract:Crystalline materials, such as metals and semiconductors, nearly always contain a special defect type called dislocation. This defect decisively determines many important material properties, e.g., strength, fracture toughness, or ductility. Over the past years, significant effort has been put into understanding dislocation behavior across different length scales via experimental characterization techniques and simulations. This paper introduces the dislocation ontology (DISO), which defines the concepts and relationships related to linear defects in crystalline materials. We developed DISO using a top-down approach in which we start defining the most general concepts in the dislocation domain and subsequent specialization of them. DISO is published through a persistent URL following W3C best practices for publishing Linked Data. Two potential use cases for DISO are presented to illustrate its usefulness in the dislocation dynamics domain. The evaluation of the ontology is performed in two directions, evaluating the success of the ontology in modeling a real-world domain and the richness of the ontology.
Abstract:In recent years, there has been a growing interest in accelerated materials innovation in both, research and industry. However, to truly add value to the development of new advanced materials, it is inevitable to take into account manufacturing processes and thereby tailor materials design approaches to support downstream process design approaches. As a major step into this direction, we present a holistic optimization approach that covers the entire materials process-structure-property chain. Our approach specifically employs machine learning techniques to address two critical identification problems. The first is to solve a materials design problem, which involves identifying near-optimal material structures that exhibit desired macroscopic properties. The second is to solve a process design problem that is to find an optimal processing path to manufacture these material structures. Both identification problems are typically ill-posed, which presents a significant challenge for solution approaches. However, the non-unique nature of these problems also offers an important advantage for processing: By having several target structures that perform similarly well, the corresponding processes can be efficiently guided towards manufacturing the best reachable structure. In particular, we apply deep reinforcement learning for process design in combination with a multi-task learning-based optimization approach for materials design. The functionality of the approach will be demonstrated by using it to manufacture crystallographic textures with desired properties in a metal forming process.
Abstract:Research in the field of Materials Science and Engineering focuses on the design, synthesis, properties, and performance of materials. An important class of materials that is widely investigated are crystalline materials, including metals and semiconductors. Crystalline material typically contains a distinct type of defect called "dislocation". This defect significantly affects various material properties, including strength, fracture toughness, and ductility. Researchers have devoted a significant effort in recent years to understanding dislocation behavior through experimental characterization techniques and simulations, e.g., dislocation dynamics simulations. This paper presents how data from dislocation dynamics simulations can be modeled using semantic web technologies through annotating data with ontologies. We extend the already existing Dislocation Ontology by adding missing concepts and aligning it with two other domain-related ontologies (i.e., the Elementary Multi-perspective Material Ontology and the Materials Design Ontology) allowing for representing the dislocation simulation data efficiently. Moreover, we show a real-world use case by representing the discrete dislocation dynamics data as a knowledge graph (DisLocKG) that illustrates the relationship between them. We also developed a SPARQL endpoint that brings extensive flexibility to query DisLocKG.




Abstract:In this study, Cu-Cr composites were studied by nanoindentation. Arrays of indents were placed over large areas of the samples resulting in datasets consisting of several hundred measurements of Young's modulus and hardness at varying indentation depths. The unsupervised learning technique, Gaussian mixture model, was employed to analyze the data, which helped to determine the number of "mechanical phases" and the respective mechanical properties. Additionally, a cross-validation approach was introduced to infer whether the data quantity was adequate and to suggest the amount of data required for reliable predictions -- one of the often encountered but difficult to resolve issues in machine learning of materials science problems.




Abstract:Quantitative Transmission Electron Microscopy (TEM) during in-situ straining experiment is able to reveal the motion of dislocations -- linear defects in the crystal lattice of metals. In the domain of materials science, the knowledge about the location and movement of dislocations is important for creating novel materials with superior properties. A long-standing problem, however, is to identify the position and extract the shape of dislocations, which would ultimately help to create a digital twin of such materials. In this work, we quantitatively compare state-of-the-art instance segmentation methods, including Mask R-CNN and YOLOv8. The dislocation masks as the results of the instance segmentation are converted to mathematical lines, enabling quantitative analysis of dislocation length and geometry -- important information for the domain scientist, which we then propose to include as a novel length-aware quality metric for estimating the network performance. Our segmentation pipeline shows a high accuracy suitable for all domain-specific, further post-processing. Additionally, our physics-based metric turns out to perform much more consistently than typically used pixel-wise metrics.




Abstract:Determining, understanding, and predicting the so-called structure-property relation is an important task in many scientific disciplines, such as chemistry, biology, meteorology, physics, engineering, and materials science. Structure refers to the spatial distribution of, e.g., substances, material, or matter in general, while property is a resulting characteristic that usually depends in a non-trivial way on spatial details of the structure. Traditionally, forward simulations models have been used for such tasks. Recently, several machine learning algorithms have been applied in these scientific fields to enhance and accelerate simulation models or as surrogate models. In this work, we develop and investigate the applications of six machine learning techniques based on two different datasets from the domain of materials science: data from a two-dimensional Ising model for predicting the formation of magnetic domains and data representing the evolution of dual-phase microstructures from the Cahn-Hilliard model. We analyze the accuracy and robustness of all models and elucidate the reasons for the differences in their performances. The impact of including domain knowledge through tailored features is studied, and general recommendations based on the availability and quality of training data are derived from this.