The pursuit of novel alloys tailored to specific requirements poses significant challenges for researchers in the field. This underscores the importance of developing predictive techniques for essential physical properties of alloys based on their chemical composition and processing parameters. This study introduces AlloyBERT, a transformer encoder-based model designed to predict properties such as elastic modulus and yield strength of alloys using textual inputs. Leveraging the pre-trained RoBERTa encoder model as its foundation, AlloyBERT employs self-attention mechanisms to establish meaningful relationships between words, enabling it to interpret human-readable input and predict target alloy properties. By combining a tokenizer trained on our textual data and a RoBERTa encoder pre-trained and fine-tuned for this specific task, we achieved a mean squared error (MSE) of 0.00015 on the Multi Principal Elemental Alloys (MPEA) data set and 0.00611 on the Refractory Alloy Yield Strength (RAYS) dataset. This surpasses the performance of shallow models, which achieved a best-case MSE of 0.00025 and 0.0076 on the MPEA and RAYS datasets respectively. Our results highlight the potential of language models in material science and establish a foundational framework for text-based prediction of alloy properties that does not rely on complex underlying representations, calculations, or simulations.
Neural solvers for partial differential equations (PDEs) have great potential, yet their practicality is currently limited by their generalizability. PDEs evolve over broad scales and exhibit diverse behaviors; predicting these phenomena will require learning representations across a wide variety of inputs, which may encompass different coefficients, geometries, or equations. As a step towards generalizable PDE modeling, we adapt masked pretraining for PDEs. Through self-supervised learning across PDEs, masked autoencoders can learn useful latent representations for downstream tasks. In particular, masked pretraining can improve coefficient regression and timestepping performance of neural solvers on unseen equations. We hope that masked pretraining can emerge as a unifying method across large, unlabeled, and heterogeneous datasets to learn latent physics at scale.
Tactile information is a critical tool for fine-grain manipulation. As humans, we rely heavily on tactile information to understand objects in our environments and how to interact with them. We use touch not only to perform manipulation tasks but also to learn how to perform these tasks. Therefore, to create robotic agents that can learn to complete manipulation tasks at a human or super-human level of performance, we need to properly incorporate tactile information into both skill execution and skill learning. In this paper, we investigate how we can incorporate tactile information into imitation learning platforms to improve performance on complex tasks. To do this, we tackle the challenge of plugging in a USB cable, a dexterous manipulation task that relies on fine-grain visuo-tactile serving. By incorporating tactile information into imitation learning frameworks, we are able to train a robotic agent to plug in a USB cable - a first for imitation learning. Additionally, we explore how tactile information can be used to train non-tactile agents through a contrastive-loss pretraining process. Our results show that by pretraining with tactile information, the performance of a non-tactile agent can be significantly improved, reaching a level on par with visuo-tactile agents. For demonstration videos and access to our codebase, see the project website: https://sites.google.com/andrew.cmu.edu/visuo-tactile-cable-plugging/home
Manipulating deformable objects remains a challenge within robotics due to the difficulties of state estimation, long-horizon planning, and predicting how the object will deform given an interaction. These challenges are the most pronounced with 3D deformable objects. We propose SculptDiff, a goal-conditioned diffusion-based imitation learning framework that works with point cloud state observations to directly learn clay sculpting policies for a variety of target shapes. To the best of our knowledge this is the first real-world method that successfully learns manipulation policies for 3D deformable objects. For sculpting videos and access to our dataset and hardware CAD models, see the project website: https://sites.google.com/andrew.cmu.edu/imitation-sculpting/home
Neural networks have shown promising potential in accelerating the numerical simulation of systems governed by partial differential equations (PDEs). Different from many existing neural network surrogates operating on high-dimensional discretized fields, we propose to learn the dynamics of the system in the latent space with much coarser discretizations. In our proposed framework - Latent Neural PDE Solver (LNS), a non-linear autoencoder is first trained to project the full-order representation of the system onto the mesh-reduced space, then a temporal model is trained to predict the future state in this mesh-reduced space. This reduction process simplifies the training of the temporal model by greatly reducing the computational cost accompanying a fine discretization. We study the capability of the proposed framework and several other popular neural PDE solvers on various types of systems including single-phase and multi-phase flows along with varying system parameters. We showcase that it has competitive accuracy and efficiency compared to the neural PDE solver that operates on full-order space.
Fluid data completion is a research problem with high potential benefit for both experimental and computational fluid dynamics. An effective fluid data completion method reduces the required number of sensors in a fluid dynamics experiment, and allows a coarser and more adaptive mesh for a Computational Fluid Dynamics (CFD) simulation. However, the ill-posed nature of the fluid data completion problem makes it prohibitively difficult to obtain a theoretical solution and presents high numerical uncertainty and instability for a data-driven approach (e.g., a neural network model). To address these challenges, we leverage recent advancements in computer vision, employing the vector quantization technique to map both complete and incomplete fluid data spaces onto discrete-valued lower-dimensional representations via a two-stage learning procedure. We demonstrated the effectiveness of our approach on Kolmogorov flow data (Reynolds number: 1000) occluded by masks of different size and arrangement. Experimental results show that our proposed model consistently outperforms benchmark models under different occlusion settings in terms of point-wise reconstruction accuracy as well as turbulent energy spectrum and vorticity distribution.
We propose a mask pretraining method for Graph Neural Networks (GNNs) to improve their performance on fitting potential energy surfaces, particularly in water systems. GNNs are pretrained by recovering spatial information related to masked-out atoms from molecules, then transferred and finetuned on atomic forcefields. Through such pretraining, GNNs learn meaningful prior about structural and underlying physical information of molecule systems that are useful for downstream tasks. From comprehensive experiments and ablation studies, we show that the proposed method improves the accuracy and convergence speed compared to GNNs trained from scratch or using other pretraining techniques such as denoising. On the other hand, our pretraining method is suitable for both energy-centric and force-centric GNNs. This approach showcases its potential to enhance the performance and data efficiency of GNNs in fitting molecular force fields.
Neural operators have recently grown in popularity as Partial Differential Equation (PDEs) surrogate models. Learning solution functionals, rather than functions, has proven to be a powerful approach to calculate fast, accurate solutions to complex PDEs. While much work has been done evaluating neural operator performance on a wide variety of surrogate modeling tasks, these works normally evaluate performance on a single equation at a time. In this work, we develop a novel contrastive pretraining framework utilizing Generalized Contrastive Loss that improves neural operator generalization across multiple governing equations simultaneously. Governing equation coefficients are used to measure ground-truth similarity between systems. A combination of physics-informed system evolution and latent-space model output are anchored to input data and used in our distance function. We find that physics-informed contrastive pretraining improves both accuracy and generalization for the Fourier Neural Operator in fixed-future task, with comparable performance on the autoregressive rollout, and superresolution tasks for the 1D Heat, Burgers', and linear advection equations.
The growth of deep learning in the past decade has motivated important applications to smart manufacturing and machine health monitoring. In particular, vibration data offers a rich and reliable source to provide meaningful insights into machine health and predictive maintenance. In this work, we present a Transformer based framework for analyzing vibration signals to predict different types of bearing faults (FaultFormer). In particular, we process signal data using data augmentations and extract their Fourier modes to train a transformer encoder to achieve state of the art accuracies. The attention mechanism as well as model outputs were analyzed to confirm the transformer's ability to automatically extract features within signals and learn both global and local relationships to make classifications. Lastly, two pretraining strategies were proposed to pave the way for large, generalizable transformers that could adapt to new data, situations, or machinery on the production floor.
Defects in laser powder bed fusion (L-PBF) parts often result from the meso-scale dynamics of the molten alloy near the laser, known as the melt pool. For instance, the melt pool can directly contribute to the formation of undesirable porosity, residual stress, and surface roughness in the final part. Experimental in-situ monitoring of the three-dimensional melt pool physical fields is challenging, due to the short length and time scales involved in the process. Multi-physics simulation methods can describe the three-dimensional dynamics of the melt pool, but are computationally expensive at the mesh refinement required for accurate predictions of complex effects, such as the formation of keyhole porosity. Therefore, in this work, we develop a generative deep learning model based on the probabilistic diffusion framework to map low-fidelity, coarse-grained simulation information to the high-fidelity counterpart. By doing so, we bypass the computational expense of conducting multiple high-fidelity simulations for analysis by instead upscaling lightweight coarse mesh simulations. Specifically, we implement a 2-D diffusion model to spatially upscale cross-sections of the coarsely simulated melt pool to their high-fidelity equivalent. We demonstrate the preservation of key metrics of the melting process between the ground truth simulation data and the diffusion model output, such as the temperature field, the melt pool dimensions and the variability of the keyhole vapor cavity. Specifically, we predict the melt pool depth within 3 $\mu m$ based on low-fidelity input data 4$\times$ coarser than the high-fidelity simulations, reducing analysis time by two orders of magnitude.