Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuki Takemoto

for the RadonPy consortium

Omics-scale polymer computational database transferable to real-world artificial intelligence applications

Nov 07, 2025

Ryo Yoshida, Yoshihiro Hayashi, Hidemine Furuya, Ryohei Hosoya, Kazuyoshi Kaneko, Hiroki Sugisawa, Yu Kaneko, Aiko Takahashi, Yoh Noguchi, Shun Nanjo(+96 more)

Abstract:Developing large-scale foundational datasets is a critical milestone in advancing artificial intelligence (AI)-driven scientific innovation. However, unlike AI-mature fields such as natural language processing, materials science, particularly polymer research, has significantly lagged in developing extensive open datasets. This lag is primarily due to the high costs of polymer synthesis and property measurements, along with the vastness and complexity of the chemical space. This study presents PolyOmics, an omics-scale computational database generated through fully automated molecular dynamics simulation pipelines that provide diverse physical properties for over $10^5$ polymeric materials. The PolyOmics database is collaboratively developed by approximately 260 researchers from 48 institutions to bridge the gap between academia and industry. Machine learning models pretrained on PolyOmics can be efficiently fine-tuned for a wide range of real-world downstream tasks, even when only limited experimental data are available. Notably, the generalisation capability of these simulation-to-real transfer models improve significantly as the size of the PolyOmics database increases, exhibiting power-law scaling. The emergence of scaling laws supports the "more is better" principle, highlighting the significance of ultralarge-scale computational materials data for improving real-world prediction performance. This unprecedented omics-scale database reveals vast unexplored regions of polymer materials, providing a foundation for AI-driven polymer science.

* 65 pages, 11 figures

Via

Access Paper or Ask Questions

Scaling Law for Large-Scale Pre-Training Using Chaotic Time Series and Predictability in Financial Time Series

Sep 05, 2025

Yuki Takemoto

Figure 1 for Scaling Law for Large-Scale Pre-Training Using Chaotic Time Series and Predictability in Financial Time Series

Figure 2 for Scaling Law for Large-Scale Pre-Training Using Chaotic Time Series and Predictability in Financial Time Series

Figure 3 for Scaling Law for Large-Scale Pre-Training Using Chaotic Time Series and Predictability in Financial Time Series

Figure 4 for Scaling Law for Large-Scale Pre-Training Using Chaotic Time Series and Predictability in Financial Time Series

Abstract:Time series forecasting plays a critical role in decision-making processes across diverse fields including meteorology, traffic, electricity, economics, finance, and so on. Especially, predicting returns on financial instruments is a challenging problem. Some researchers have proposed time series foundation models applicable to various forecasting tasks. Simultaneously, based on the recognition that real-world time series exhibit chaotic properties, methods have been developed to artificially generate synthetic chaotic time series, construct diverse datasets and train models. In this study, we propose a methodology for modeling financial time series by generating artificial chaotic time series and applying resampling techniques to simulate financial time series data, which we then use as training samples. Increasing the resampling interval to extend predictive horizons, we conducted large-scale pre-training using 10 billion training samples for each case. We subsequently created test datasets for multiple timeframes using actual Bitcoin trade data and performed zero-shot prediction without re-training the pre-trained model. The results of evaluating the profitability of a simple trading strategy based on these predictions demonstrated significant performance improvements over autocorrelation models. During the large-scale pre-training process, we observed a scaling law-like phenomenon that we can achieve predictive performance at a certain level with extended predictive horizons for chaotic time series by increasing the number of training samples exponentially. If this scaling law proves robust and holds true across various chaotic models, it suggests the potential to predict near-future events by investing substantial computational resources. Future research should focus on further large-scale training and verifying the applicability of this scaling law to diverse chaotic models.

* Patent pending

Via

Access Paper or Ask Questions