Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mingsheng Long

CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding

May 03, 2024

Kaiyuan Chen, Xingzhuo Guo, Yu Zhang, Jianmin Wang, Mingsheng Long

Abstract:Predictive Coding (PC) is a theoretical framework in cognitive science suggesting that the human brain processes cognition through spatiotemporal prediction of the visual world. Existing studies have developed spatiotemporal prediction neural networks based on the PC theory, emulating its two core mechanisms: Correcting predictions from residuals and hierarchical learning. However, these models do not show the enhancement of prediction skills on real-world forecasting tasks and ignore the Precision Weighting mechanism of PC theory. The precision weighting mechanism posits that the brain allocates more attention to signals with lower precision, contributing to the cognitive ability of human brains. This work introduces the Cognitive Diffusion Probabilistic Models (CogDPM), which demonstrate the connection between diffusion probabilistic models and PC theory. CogDPM features a precision estimation method based on the hierarchical sampling capabilities of diffusion models and weight the guidance with precision weights estimated by the inherent property of diffusion models. We experimentally show that the precision weights effectively estimate the data predictability. We apply CogDPM to real-world prediction tasks using the United Kindom precipitation and ERA surface wind datasets. Our results demonstrate that CogDPM outperforms both existing domain-specific operational models and general deep prediction models by providing more proficient forecasting.

Via

Access Paper or Ask Questions

Supercompiler Code Optimization with Zero-Shot Reinforcement Learning

Apr 24, 2024

Jialong Wu, Chaoyi Deng, Jianmin Wang, Mingsheng Long

Figure 1 for Supercompiler Code Optimization with Zero-Shot Reinforcement Learning

Figure 2 for Supercompiler Code Optimization with Zero-Shot Reinforcement Learning

Figure 3 for Supercompiler Code Optimization with Zero-Shot Reinforcement Learning

Figure 4 for Supercompiler Code Optimization with Zero-Shot Reinforcement Learning

Abstract:Effective code optimization in compilers plays a central role in computer and software engineering. While compilers can be made to automatically search the optimization space without the need for user interventions, this is not a standard practice since the search is slow and cumbersome. Here we present CodeZero, an artificial intelligence agent trained extensively on large data to produce effective optimization strategies instantly for each program in a single trial of the agent. To overcome the huge range of possible test programs, we prepare a large dataset of training programs that emphasize quality, naturalness, and diversity. To tackle the vast space of possible optimizations, we adapt deep reinforcement learning to train the agent in a sample-efficient manner through interacting with a world model of the compiler environment. Evaluation on both benchmark suites and production-level code optimization problems demonstrates our agent's supercompiler performances and zero-shot generalization abilities, outperforming built-in optimization options designed by compiler experts. Our methodology kindles the great potential of artificial intelligence for engineering and paves the way for scaling machine learning techniques in the realm of code optimization.

Via

Access Paper or Ask Questions

depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers

Mar 14, 2024

Kaichao You, Runsheng Bai, Meng Cao, Jianmin Wang, Ion Stoica, Mingsheng Long

Figure 1 for depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers

Figure 2 for depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers

Figure 3 for depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers

Abstract:PyTorch \texttt{2.x} introduces a compiler designed to accelerate deep learning programs. However, for machine learning researchers, adapting to the PyTorch compiler to full potential can be challenging. The compiler operates at the Python bytecode level, making it appear as an opaque box. To address this, we introduce \texttt{depyf}, a tool designed to demystify the inner workings of the PyTorch compiler. \texttt{depyf} decompiles bytecode generated by PyTorch back into equivalent source code, and establishes connections between in-memory code objects and their on-disk source code counterparts. This feature enables users to step through the source code line by line using debuggers, thus enhancing their understanding of the underlying processes. Notably, \texttt{depyf} is non-intrusive and user-friendly, primarily relying on two convenient context managers for its core functionality. The project is \href{https://github.com/thuml/depyf}{ openly available} and is recognized as a \href{https://pytorch.org/ecosystem/}{PyTorch ecosystem project}.

* 16 pages, 2 figures

Via

Access Paper or Ask Questions

TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

Feb 29, 2024

Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Yunzhong Qiu, Haoran Zhang, Jianmin Wang, Mingsheng Long

Figure 1 for TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

Figure 2 for TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

Figure 3 for TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

Figure 4 for TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

Abstract:Recent studies have demonstrated remarkable performance in time series forecasting. However, due to the partially-observed nature of real-world applications, solely focusing on the target of interest, so-called endogenous variables, is usually insufficient to guarantee accurate forecasting. Notably, a system is often recorded into multiple variables, where the exogenous series can provide valuable external information for endogenous variables. Thus, unlike prior well-established multivariate or univariate forecasting that either treats all the variables equally or overlooks exogenous information, this paper focuses on a practical setting, which is time series forecasting with exogenous variables. We propose a novel framework, TimeXer, to utilize external information to enhance the forecasting of endogenous variables. With a deftly designed embedding layer, TimeXer empowers the canonical Transformer architecture with the ability to reconcile endogenous and exogenous information, where patch-wise self-attention and variate-wise cross-attention are employed. Moreover, a global endogenous variate token is adopted to effectively bridge the exogenous series into endogenous temporal patches. Experimentally, TimeXer significantly improves time series forecasting with exogenous variables and achieves consistent state-of-the-art performance in twelve real-world forecasting benchmarks.

Via

Access Paper or Ask Questions

TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling

Feb 04, 2024

Jiaxiang Dong, Haixu Wu, Yuxuan Wang, Yunzhong Qiu, Li Zhang, Jianmin Wang, Mingsheng Long

Figure 1 for TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling

Figure 2 for TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling

Figure 3 for TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling

Figure 4 for TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling

Abstract:Time series pre-training has recently garnered wide attention for its potential to reduce labeling expenses and benefit various downstream tasks. Prior methods are mainly based on pre-training techniques well-acknowledged in vision or language, such as masked modeling and contrastive learning. However, randomly masking time series or calculating series-wise similarity will distort or neglect inherent temporal correlations crucial in time series data. To emphasize temporal correlation modeling, this paper proposes TimeSiam as a simple but effective self-supervised pre-training framework for Time series based on Siamese networks. Concretely, TimeSiam pre-trains Siamese encoders to capture intrinsic temporal correlations between randomly sampled past and current subseries. With a simple data augmentation method (e.g.~masking), TimeSiam can benefit from diverse augmented subseries and learn internal time-dependent representations through a past-to-current reconstruction. Moreover, learnable lineage embeddings are also introduced to distinguish temporal distance between sampled series and further foster the learning of diverse temporal correlations. TimeSiam consistently outperforms extensive advanced pre-training baselines, demonstrating superior forecasting and classification capabilities across 13 standard benchmarks in both intra- and cross-domain scenarios.

Via

Access Paper or Ask Questions

AutoTimes: Autoregressive Time Series Forecasters via Large Language Models

Feb 04, 2024

Yong Liu, Guo Qin, Xiangdong Huang, Jianmin Wang, Mingsheng Long

Figure 1 for AutoTimes: Autoregressive Time Series Forecasters via Large Language Models

Figure 2 for AutoTimes: Autoregressive Time Series Forecasters via Large Language Models

Figure 3 for AutoTimes: Autoregressive Time Series Forecasters via Large Language Models

Figure 4 for AutoTimes: Autoregressive Time Series Forecasters via Large Language Models

Abstract:Foundation models of time series have not been fully developed due to the limited availability of large-scale time series and the underexploration of scalable pre-training. Based on the similar sequential structure of time series and natural language, increasing research demonstrates the feasibility of leveraging large language models (LLM) for time series. Nevertheless, prior methods may overlook the consistency in aligning time series and natural language, resulting in insufficient utilization of the LLM potentials. To fully exploit the general-purpose token transitions learned from language modeling, we propose AutoTimes to repurpose LLMs as Autoregressive Time series forecasters, which is consistent with the acquisition and utilization of LLMs without updating the parameters. The consequent forecasters can handle flexible series lengths and achieve competitive performance as prevalent models. Further, we present token-wise prompting that utilizes corresponding timestamps to make our method applicable to multimodal scenarios. Analysis demonstrates our forecasters inherit zero-shot and in-context learning capabilities of LLMs. Empirically, AutoTimes exhibits notable method generality and achieves enhanced performance by basing on larger LLMs, additional texts, or time series as instructions.

Via

Access Paper or Ask Questions

Timer: Transformers for Time Series Analysis at Scale

Feb 04, 2024

Yong Liu, Haoran Zhang, Chenyu Li, Xiangdong Huang, Jianmin Wang, Mingsheng Long

Figure 1 for Timer: Transformers for Time Series Analysis at Scale

Figure 2 for Timer: Transformers for Time Series Analysis at Scale

Figure 3 for Timer: Transformers for Time Series Analysis at Scale

Figure 4 for Timer: Transformers for Time Series Analysis at Scale

Abstract:Deep learning has contributed remarkably to the advancement of time series analysis. Still, deep models can encounter performance bottlenecks in real-world small-sample scenarios, which can be concealed due to the performance saturation with small models on current benchmarks. Meanwhile, large models have demonstrated great powers in these scenarios through large-scale pre-training. Continuous progresses have been achieved as the emergence of large language models, exhibiting unprecedented ability in few-shot generalization, scalability, and task generality, which is however absent in time series models. To change the current practices of training small models on specific datasets from scratch, this paper aims at an early development of large time series models (LTSM). During pre-training, we curate large-scale datasets with up to 1 billion time points, unify heterogeneous time series into single-series sequence (S3) format, and develop the GPT-style architecture toward LTSMs. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task. The outcome of this study is a Time Series Transformer (Timer), that is pre-trained by autoregressive next token prediction on large multi-domain datasets, and is fine-tuned to downstream scenarios with promising abilities as an LTSM.

Via

Access Paper or Ask Questions

Transolver: A Fast Transformer Solver for PDEs on General Geometries

Feb 04, 2024

Haixu Wu, Huakun Luo, Haowen Wang, Jianmin Wang, Mingsheng Long

Figure 1 for Transolver: A Fast Transformer Solver for PDEs on General Geometries

Figure 2 for Transolver: A Fast Transformer Solver for PDEs on General Geometries

Figure 3 for Transolver: A Fast Transformer Solver for PDEs on General Geometries

Figure 4 for Transolver: A Fast Transformer Solver for PDEs on General Geometries

Abstract:Transformers have empowered many milestones across various fields and have recently been applied to solve partial differential equations (PDEs). However, since PDEs are typically discretized into large-scale meshes with complex geometries, it is challenging for Transformers to capture intricate physical correlations directly from massive individual points. Going beyond superficial and unwieldy meshes, we present Transolver based on a more foundational idea, which is learning intrinsic physical states hidden behind discretized geometries. Specifically, we propose a new Physics-Attention to adaptively split the discretized domain into a series of learnable slices of flexible shapes, where mesh points under similar physical states will be ascribed to the same slice. By calculating attention to physics-aware tokens encoded from slices, Transovler can effectively capture intricate physical correlations under complex geometrics, which also empowers the solver with endogenetic geometry-general modeling capacity and can be efficiently computed in linear complexity. Transolver achieves consistent state-of-the-art with 22\% relative gain across six standard benchmarks and also excels in large-scale industrial simulations, including car and airfoil designs.

Via

Access Paper or Ask Questions

EuLagNet: Eulerian Fluid Prediction with Lagrangian Dynamics

Feb 04, 2024

Qilong Ma, Haixu Wu, Lanxiang Xing, Jianmin Wang, Mingsheng Long

Figure 1 for EuLagNet: Eulerian Fluid Prediction with Lagrangian Dynamics

Figure 2 for EuLagNet: Eulerian Fluid Prediction with Lagrangian Dynamics

Figure 3 for EuLagNet: Eulerian Fluid Prediction with Lagrangian Dynamics

Figure 4 for EuLagNet: Eulerian Fluid Prediction with Lagrangian Dynamics

Abstract:Accurately predicting the future fluid is important to extensive areas, such as meteorology, oceanology and aerodynamics. However, since the fluid is usually observed from an Eulerian perspective, its active and intricate dynamics are seriously obscured and confounded in static grids, bringing horny challenges to the prediction. This paper introduces a new Lagrangian-guided paradigm to tackle the tanglesome fluid dynamics. Instead of solely predicting the future based on Eulerian observations, we propose the Eulerian-Lagrangian Dual Recurrent Network (EuLagNet), which captures multiscale fluid dynamics by tracking movements of adaptively sampled key particles on multiple scales and integrating dynamics information over time. Concretely, a EuLag Block is presented to communicate the learned Eulerian and Lagrangian features at each moment and scale, where the motion of tracked particles is inferred from Eulerian observations and their accumulated dynamics information is incorporated into Eulerian fields to guide future prediction. Tracking key particles not only provides a clear and interpretable clue for fluid dynamics but also makes our model free from modeling complex correlations among massive grids for better efficiency. Experimentally, EuLagNet excels in three challenging fluid prediction tasks, covering both 2D and 3D, simulated and real-world fluids.

Via

Access Paper or Ask Questions

HelmSim: Learning Helmholtz Dynamics for Interpretable Fluid Simulation

Oct 16, 2023

Lanxiang Xing, Haixu Wu, Yuezhou Ma, Jianmin Wang, Mingsheng Long

Figure 1 for HelmSim: Learning Helmholtz Dynamics for Interpretable Fluid Simulation

Figure 2 for HelmSim: Learning Helmholtz Dynamics for Interpretable Fluid Simulation

Figure 3 for HelmSim: Learning Helmholtz Dynamics for Interpretable Fluid Simulation

Figure 4 for HelmSim: Learning Helmholtz Dynamics for Interpretable Fluid Simulation

Abstract:Fluid simulation is a long-standing challenge due to the intrinsic high-dimensional non-linear dynamics. Previous methods usually utilize the non-linear modeling capability of deep models to directly estimate velocity fields for future prediction. However, skipping over inherent physical properties but directly learning superficial velocity fields will overwhelm the model from generating precise or physics-reliable results. In this paper, we propose the HelmSim toward an accurate and interpretable simulator for fluid. Inspired by the Helmholtz theorem, we design a HelmDynamic block to learn the Helmholtz dynamics, which decomposes fluid dynamics into more solvable curl-free and divergence-free parts, physically corresponding to potential and stream functions of fluid. By embedding the HelmDynamic block into a Multiscale Integration Network, HelmSim can integrate learned Helmholtz dynamics along temporal dimension in multiple spatial scales to yield future fluid. Comparing with previous velocity estimating methods, HelmSim is faithfully derived from Helmholtz theorem and ravels out complex fluid dynamics with physically interpretable evidence. Experimentally, our proposed HelmSim achieves the consistent state-of-the-art in both numerical simulated and real-world observed benchmarks, even for scenarios with complex boundaries.

Via

Access Paper or Ask Questions