Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Limin Li

From Materials Database to Materials Bank: Assetizing Data for AI Driven Materials Innovation

Jun 30, 2026

Chenyao Ma, Di Zhang, Weibo Gong, Wei Du, Rui Su, Yuhang Chen, Kan Xu, Huan Gu, Limin Li, Piao Ma(+2 more)

Abstract:Driven by high-throughput experimentation, computational modeling, and artificial intelligence (AI), materials data has expanded at an unprecedented rate. Conventional materials databases function only as passive repositories, archiving raw experimental records indiscriminately including both successful and failed data, without systematic value filtering or asset management. This creates a critical gap between massive data accumulation and actionable innovation, hindering the identification of high-potential materials and industrial translation. To address this bottleneck, we propose an industrialization-oriented Materials Bank, a dedicated valuefiltering and assetization layer that operates beyond traditional databases. It does not merely curate high-quality data but systematically elevates qualified candidates into standardized, upgradable materials assets via a multi-dimensional BankCard framework covering scientific validity, synthesis feasibility, application readiness, and industrial value. By unifying databases, AI models, automated experimentation, and multi-criteria assessment into a cohesive closed-loop ecosystem, the Materials Bank establishes a clear trajectory from data to knowledge, candidate, asset, and product. It serves not as an enhanced database or screening tool, but as a decision infrastructure bridging academic discovery and industrial demand, offering a scalable paradigm to accelerate AI-driven materials innovation and deliver tangible real-world impact.

Via

Access Paper or Ask Questions

Empowering Polymeric Materials Discovery by Artificial Intelligence

Jun 18, 2026

Chenyao Ma, Linda Zhang, Yuheng Chen, Wei Du, Shangwen Fang, Zihao Jiang, Chuanyu Liu, Xinyu Ma, Rui Su, Gang Wang(+22 more)

Abstract:Polymeric materials underpin modern technologies spanning energy storage, microelectronics, healthcare and sustainable manufacturing. Yet their rational design remains exceptionally challenging because material performance emerges from complex interactions among molecular composition, chain architecture, processing history and hierarchical structural evolution across multiple length and time scales. Consequently, polymer research has long relied on labor-intensive experimentation and fragmented modeling approaches, limiting both mechanistic understanding and innovation efficiency. Recent advances in data infrastructure, machine learning, large artificial intelligence (AI) models and laboratory automation are beginning to reshape this landscape. Rather than functioning as isolated tools, polymer databases, predictive models, AI agents and automated laboratories are increasingly converging into interconnected discovery ecosystems. As a result, the central challenge is shifting from improving predictive accuracy alone to enabling reliable decision-making, adaptive learning and seamless integration across computation, experimentation and scientific reasoning. We argue that polymer science is entering an era of autonomous discovery, in which data, simulation, reasoning and experimentation operate within self-improving feedback loops that continuously generate hypotheses, design materials, execute experiments and refine predictive models. By unifying molecular design, process optimization, experimental validation and industrial translation, such autonomous ecosystems establish a more predictive, reproducible and scalable paradigm for polymer innovation, fundamentally transforming how polymer research is conducted.

Via

Access Paper or Ask Questions

Persistent Nonnegative Matrix Factorization via Multi-Scale Graph Regularization

Feb 26, 2026

Jichao Zhang, Ran Miao, Limin Li

Abstract:Matrix factorization techniques, especially Nonnegative Matrix Factorization (NMF), have been widely used for dimensionality reduction and interpretable data representation. However, existing NMF-based methods are inherently single-scale and fail to capture the evolution of connectivity structures across resolutions. In this work, we propose persistent nonnegative matrix factorization (pNMF), a scale-parameterized family of NMF problems, that produces a sequence of persistence-aligned embeddings rather than a single one. By leveraging persistent homology, we identify a canonical minimal sufficient scale set at which the underlying connectivity undergoes qualitative changes. These canonical scales induce a sequence of graph Laplacians, leading to a coupled NMF formulation with scale-wise geometric regularization and explicit cross-scale consistency constraint. We analyze the structural properties of the embeddings along the scale parameter and establish bounds on their increments between consecutive scales. The resulting model defines a nontrivial solution path across scales, rather than a single factorization, which poses new computational challenges. We develop a sequential alternating optimization algorithm with guaranteed convergence. Numerical experiments on synthetic and single-cell RNA sequencing datasets demonstrate the effectiveness of the proposed approach in multi-scale low-rank embeddings.

Via

Access Paper or Ask Questions

Soft causal learning for generalized molecule property prediction: An environment perspective

May 07, 2025

Limin Li, Kuo Yang, Wenjie Du, Pengkun Wang, Zhengyang Zhou, Yang Wang

Abstract:Learning on molecule graphs has become an increasingly important topic in AI for science, which takes full advantage of AI to facilitate scientific discovery. Existing solutions on modeling molecules utilize Graph Neural Networks (GNNs) to achieve representations but they mostly fail to adapt models to out-of-distribution (OOD) samples. Although recent advances on OOD-oriented graph learning have discovered the invariant rationale on graphs, they still ignore three important issues, i.e., 1) the expanding atom patterns regarding environments on graphs lead to failures of invariant rationale based models, 2) the associations between discovered molecular subgraphs and corresponding properties are complex where causal substructures cannot fully interpret the labels. 3) the interactions between environments and invariances can influence with each other thus are challenging to be modeled. To this end, we propose a soft causal learning framework, to tackle the unresolved OOD challenge in molecular science, from the perspective of fully modeling the molecule environments and bypassing the invariant subgraphs. Specifically, we first incorporate chemistry theories into our graph growth generator to imitate expaned environments, and then devise an GIB-based objective to disentangle environment from whole graphs and finally introduce a cross-attention based soft causal interaction, which allows dynamic interactions between environments and invariances. We perform experiments on seven datasets by imitating different kinds of OOD generalization scenarios. Extensive comparison, ablation experiments as well as visualized case studies demonstrate well generalization ability of our proposal.

* 23 pages, 7 figures, 3 tables

Via

Access Paper or Ask Questions