The availability and easy access of large scale experimental and computational materials data have enabled the emergence of accelerated development of algorithms and models for materials property prediction, structure prediction, and generative design of materials. However, lack of user-friendly materials informatics web servers has severely constrained the wide adoption of such tools in the daily practice of materials screening, tinkering, and design space exploration by materials scientists. Herein we first survey current materials informatics web apps and then propose and develop MaterialsAtlas.org, a web based materials informatics toolbox for materials discovery, which includes a variety of routinely needed tools for exploratory materials discovery, including materials composition and structure check (e.g. for neutrality, electronegativity balance, dynamic stability, Pauling rules), materials property prediction (e.g. band gap, elastic moduli, hardness, thermal conductivity), and search for hypothetical materials. These user-friendly tools can be freely accessed at \url{www.materialsatlas.org}. We argue that such materials informatics apps should be widely developed by the community to speed up the materials discovery processes.
In this paper, a family of novel diffusion adaptive estimation algorithm is proposed from the asymmetric cost function perspective by combining diffusion strategy and the linear-linear cost (LLC), quadratic-quadratic cost (QQC), and linear-exponential cost (LEC), at all distributed network nodes, and named diffusion LLCLMS (DLLCLMS), diffusion QQCLMS (DQQCLMS), and diffusion LECLMS (DLECLMS), respectively. Then the stability of mean estimation error and computational complexity of those three diffusion algorithms are analyzed theoretically. Finally, several experiment simulation results are designed to verify the superiority of those three proposed diffusion algorithms. Experimental simulation results show that DLLCLMS, DQQCLMS, and DLECLMS algorithms are more robust to the input signal and impulsive noise than the DSELMS, DRVSSLMS, and DLLAD algorithms. In brief, theoretical analysis and experiment results show that those proposed DLLCLMS, DQQCLMS, and DLECLMS algorithms have superior performance when estimating the unknown linear system under the changeable impulsive noise environments and different types of input signals.
Active learning has been increasingly applied to screening functional materials from existing materials databases with desired properties. However, the number of known materials deposited in the popular materials databases such as ICSD and Materials Project is extremely limited and consists of just a tiny portion of the vast chemical design space. Herein we present an active generative inverse design method that combines active learning with a deep variational autoencoder neural network and a generative adversarial deep neural network model to discover new materials with a target property in the whole chemical design space. The application of this method has allowed us to discover new thermodynamically stable materials with high band gap (SrYF$_5$) and semiconductors with specified band gap ranges (SrClF$_3$, CaClF$_5$, YCl$_3$, SrC$_2$F$_3$, AlSCl, As$_2$O$_3$), all of which are verified by the first principle DFT calculations. Our experiments show that while active learning itself may sample chemically infeasible candidates, these samples help to train effective screening models for filtering out materials with desired properties from the hypothetical materials created by the generative model. The experiments show the effectiveness of our active generative inverse design approach.
Two dimensional (2D) materials have emerged as promising functional materials with many applications such as semiconductors and photovoltaics because of their unique optoelectronic properties. While several thousand 2D materials have been screened in existing materials databases, discovering new 2D materials remains to be challenging. Herein we propose a deep learning generative model for composition generation combined with random forest based 2D materials classifier to discover new hypothetical 2D materials. Furthermore, a template based element substitution structure prediction approach is developed to predict the crystal structures of a subset of the newly predicted hypothetical formulas, which allows us to confirm their structure stability using DFT calculations. So far, we have discovered 267,489 new potential 2D materials compositions and confirmed twelve 2D/layered materials by DFT formation energy calculation. Our results show that generative machine learning models provide an effective way to explore the vast chemical design space for new 2D materials discovery.
Reverse-engineering bar charts extracts textual and numeric information from the visual representations of bar charts to support application scenarios that require the underlying information. In this paper, we propose a neural network-based method for reverse-engineering bar charts. We adopt a neural network-based object detection model to simultaneously localize and classify textual information. This approach improves the efficiency of textual information extraction. We design an encoder-decoder framework that integrates convolutional and recurrent neural networks to extract numeric information. We further introduce an attention mechanism into the framework to achieve high accuracy and robustness. Synthetic and real-world datasets are used to evaluate the effectiveness of the method. To the best of our knowledge, this work takes the lead in constructing a complete neural network-based method of reverse-engineering bar charts.
Materials representation plays a key role in machine learning based prediction of materials properties and new materials discovery. Currently both graph and 3D voxel representation methods are based on the heterogeneous elements of the crystal structures. Here, we propose to use electronic charge density (ECD) as a generic unified 3D descriptor for materials property prediction with the advantage of possessing close relation with the physical and chemical properties of materials. We developed an ECD based 3D convolutional neural networks (CNNs) for predicting elastic properties of materials, in which CNNs can learn effective hierarchical features with multiple convolving and pooling operations. Extensive benchmark experiments over 2,170 Fm-3m face-centered-cubic (FCC) materials show that our ECD based CNNs can achieve good performance for elasticity prediction. Especially, our CNN models based on the fusion of elemental Magpie features and ECD descriptors achieved the best 5-fold cross-validation performance. More importantly, we showed that our ECD based CNN models can achieve significantly better extrapolation performance when evaluated over non-redundant datasets where there are few neighbor training samples around test samples. As additional validation, we evaluated the predictive performance of our models on 329 materials of space group Fm-3m by comparing to DFT calculated values, which shows better prediction power of our model for bulk modulus than shear modulus. Due to the unified representation power of ECD, it is expected that our ECD based CNN approach can also be applied to predict other physical and chemical properties of crystalline materials.
Machine learning (ML) methods have gained increasing popularity in exploring and developing new materials. More specifically, graph neural network (GNN) has been applied in predicting material properties. In this work, we develop a novel model, GATGNN, for predicting inorganic material properties based on graph neural networks composed of multiple graph-attention layers (GAT) and a global attention layer. Through the application of the GAT layers, our model can efficiently learn the complex bonds shared among the atoms within each atom's local neighborhood. Subsequently, the global attention layer provides the weight coefficients of each atom in the inorganic crystal material which are used to considerably improve our model's performance. Notably, with the development of our GATGNN model, we show that our method is able to both outperform the previous models' predictions and provide insight into the crystallization of the material.
Noncentrosymmetric materials play a critical role in many important applications such as laser technology, communication systems,quantum computing, cybersecurity, and etc. However, the experimental discovery of new noncentrosymmetric materials is extremely difficult. Here we present a machine learning model that could predict whether the composition of a potential crystalline structure would be centrosymmetric or not. By evaluating a diverse set of composition features calculated using matminer featurizer package coupled with different machine learning algorithms, we find that Random Forest Classifiers give the best performance for noncentrosymmetric material prediction, reaching an accuracy of 84.8% when evaluated with 10 fold cross-validation on the dataset with 82,506 samples extracted from Materials Project. A random forest model trained with materials with only 3 elements gives even higher accuracy of 86.9%. We apply our ML model to screen potential noncentrosymmetric materials from 2,000,000 hypothetical materials generated by our inverse design engine and report the top 20 candidate noncentrosymmetric materials with 2 to 4 elements and top 20 borate candidates
This paper describes a system that generates speaker-annotated transcripts of meetings by using a microphone array and a 360-degree camera. The hallmark of the system is its ability to handle overlapped speech, which has been an unsolved problem in realistic settings for over a decade. We show that this problem can be addressed by using a continuous speech separation approach. In addition, we describe an online audio-visual speaker diarization method that leverages face tracking and identification, sound source localization, speaker identification, and, if available, prior speaker information for robustness to various real world challenges. All components are integrated in a meeting transcription framework called SRD, which stands for "separate, recognize, and diarize". Experimental results using recordings of natural meetings involving up to 11 attendees are reported. The continuous speech separation improves a word error rate (WER) by 16.1% compared with a highly tuned beamformer. When a complete list of meeting attendees is available, the discrepancy between WER and speaker-attributed WER is only 1.0%, indicating accurate word-to-speaker association. This increases marginally to 1.6% when 50% of the attendees are unknown to the system.