Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mingsheng Long

Out-of-Dynamics Imitation Learning from Multimodal Demonstrations

Nov 13, 2022

Yiwen Qiu, Jialong Wu, Zhangjie Cao, Mingsheng Long

Figure 1 for Out-of-Dynamics Imitation Learning from Multimodal Demonstrations

Figure 2 for Out-of-Dynamics Imitation Learning from Multimodal Demonstrations

Figure 3 for Out-of-Dynamics Imitation Learning from Multimodal Demonstrations

Figure 4 for Out-of-Dynamics Imitation Learning from Multimodal Demonstrations

Abstract:Existing imitation learning works mainly assume that the demonstrator who collects demonstrations shares the same dynamics as the imitator. However, the assumption limits the usage of imitation learning, especially when collecting demonstrations for the imitator is difficult. In this paper, we study out-of-dynamics imitation learning (OOD-IL), which relaxes the assumption to that the demonstrator and the imitator have the same state spaces but could have different action spaces and dynamics. OOD-IL enables imitation learning to utilize demonstrations from a wide range of demonstrators but introduces a new challenge: some demonstrations cannot be achieved by the imitator due to the different dynamics. Prior works try to filter out such demonstrations by feasibility measurements, but ignore the fact that the demonstrations exhibit a multimodal distribution since the different demonstrators may take different policies in different dynamics. We develop a better transferability measurement to tackle this newly-emerged challenge. We firstly design a novel sequence-based contrastive clustering algorithm to cluster demonstrations from the same mode to avoid the mutual interference of demonstrations from different modes, and then learn the transferability of each demonstration with an adversarial-learning based algorithm in each cluster. Experiment results on several MuJoCo environments, a driving environment, and a simulated robot environment show that the proposed transferability measurement more accurately finds and down-weights non-transferable demonstrations and outperforms prior works on the final imitation learning performance. We show the videos of our experiment results on our website.

Via

Access Paper or Ask Questions

TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis

Oct 05, 2022

Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, Mingsheng Long

Figure 1 for TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis

Figure 2 for TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis

Figure 3 for TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis

Figure 4 for TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis

Abstract:Time series analysis is of immense importance in extensive applications, such as weather forecasting, anomaly detection, and action recognition. This paper focuses on temporal variation modeling, which is the common key problem of extensive analysis tasks. Previous methods attempt to accomplish this directly from the 1D time series, which is extremely challenging due to the intricate temporal patterns. Based on the observation of multi-periodicity in time series, we ravel out the complex temporal variations into the multiple intraperiod- and interperiod-variations. To tackle the limitations of 1D time series in representation capability, we extend the analysis of temporal variations into the 2D space by transforming the 1D time series into a set of 2D tensors based on multiple periods. This transformation can embed the intraperiod- and interperiod-variations into the columns and rows of the 2D tensors respectively, making the 2D-variations to be easily modeled by 2D kernels. Technically, we propose the TimesNet with TimesBlock as a task-general backbone for time series analysis. TimesBlock can discover the multi-periodicity adaptively and extract the complex temporal variations from transformed 2D tensors by a parameter-efficient inception block. Our proposed TimesNet achieves consistent state-of-the-art in five mainstream time series analysis tasks, including short- and long-term forecasting, imputation, classification, and anomaly detection.

Via

Access Paper or Ask Questions

Recommender Transformers with Behavior Pathways

Jun 13, 2022

Zhiyu Yao, Xinyang Chen, Sinan Wang, Qinyan Dai, Yumeng Li, Tanchao Zhu, Mingsheng Long

Figure 1 for Recommender Transformers with Behavior Pathways

Figure 2 for Recommender Transformers with Behavior Pathways

Figure 3 for Recommender Transformers with Behavior Pathways

Figure 4 for Recommender Transformers with Behavior Pathways

Abstract:Sequential recommendation requires the recommender to capture the evolving behavior characteristics from logged user behavior data for accurate recommendations. However, user behavior sequences are viewed as a script with multiple ongoing threads intertwined. We find that only a small set of pivotal behaviors can be evolved into the user's future action. As a result, the future behavior of the user is hard to predict. We conclude this characteristic for sequential behaviors of each user as the Behavior Pathway. Different users have their unique behavior pathways. Among existing sequential models, transformers have shown great capacity in capturing global-dependent characteristics. However, these models mainly provide a dense distribution over all previous behaviors using the self-attention mechanism, making the final predictions overwhelmed by the trivial behaviors not adjusted to each user. In this paper, we build the Recommender Transformer (RETR) with a novel Pathway Attention mechanism. RETR can dynamically plan the behavior pathway specified for each user, and sparingly activate the network through this behavior pathway to effectively capture evolving patterns useful for recommendation. The key design is a learned binary route to prevent the behavior pathway from being overwhelmed by trivial behaviors. We empirically verify the effectiveness of RETR on seven real-world datasets and RETR yields state-of-the-art performance.

Via

Access Paper or Ask Questions

Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models

Jun 08, 2022

Yang Shu, Zhangjie Cao, Ziyang Zhang, Jianmin Wang, Mingsheng Long

Figure 1 for Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models

Figure 2 for Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models

Figure 3 for Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models

Figure 4 for Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models

Abstract:Transfer learning aims to leverage knowledge from pre-trained models to benefit the target task. Prior transfer learning work mainly transfers from a single model. However, with the emergence of deep models pre-trained from different resources, model hubs consisting of diverse models with various architectures, pre-trained datasets and learning paradigms are available. Directly applying single-model transfer learning methods to each model wastes the abundant knowledge of the model hub and suffers from high computational cost. In this paper, we propose a Hub-Pathway framework to enable knowledge transfer from a model hub. The framework generates data-dependent pathway weights, based on which we assign the pathway routes at the input level to decide which pre-trained models are activated and passed through, and then set the pathway aggregation at the output level to aggregate the knowledge from different models to make predictions. The proposed framework can be trained end-to-end with the target task-specific loss, where it learns to explore better pathway configurations and exploit the knowledge in pre-trained models for each target datum. We utilize a noisy pathway generator and design an exploration loss to further explore different pathways throughout the model hub. To fully exploit the knowledge in pre-trained models, each model is further trained by specific data that activate it, which ensures its performance and enhances knowledge transfer. Experiment results on computer vision and reinforcement learning tasks demonstrate that the proposed Hub-Pathway framework achieves the state-of-the-art performance for model hub transfer learning.

Via

Access Paper or Ask Questions

Non-stationary Transformers: Rethinking the Stationarity in Time Series Forecasting

May 31, 2022

Yong Liu, Haixu Wu, Jianmin Wang, Mingsheng Long

Figure 1 for Non-stationary Transformers: Rethinking the Stationarity in Time Series Forecasting

Figure 2 for Non-stationary Transformers: Rethinking the Stationarity in Time Series Forecasting

Figure 3 for Non-stationary Transformers: Rethinking the Stationarity in Time Series Forecasting

Figure 4 for Non-stationary Transformers: Rethinking the Stationarity in Time Series Forecasting

Abstract:Transformers have shown great power in time series forecasting due to their global-range modeling ability. However, their performance can degenerate terribly on non-stationary real-world data in which the joint distribution changes over time. Previous studies primarily adopt stationarization to reduce the non-stationarity of original series for better predictability. But the stationarized series deprived of inherent non-stationarity can be less instructive for real-world bursty events forecasting. This problem, termed over-stationarization in this paper, leads Transformers to generate indistinguishable temporal attentions for different series and impedes the predictive capability of deep models. To tackle the dilemma between series predictability and model capability, we propose Non-stationary Transformers as a generic framework with two interdependent modules: Series Stationarization and De-stationary Attention. Concretely, Series Stationarization unifies the statistics of each input and converts the output with restored statistics for better predictability. To address over-stationarization, De-stationary Attention is devised to recover the intrinsic non-stationary information into temporal dependencies by approximating distinguishable attentions learned from unstationarized series. Our Non-stationary Transformers framework consistently boosts mainstream Transformers by a large margin, which reduces 49.43% MSE on Transformer, 47.34% on Informer, and 46.89% on Reformer, making them the state-of-the-art in time series forecasting.

Via

Access Paper or Ask Questions

MetaSets: Meta-Learning on Point Sets for Generalizable Representations

Apr 15, 2022

Chao Huang, Zhangjie Cao, Yunbo Wang, Jianmin Wang, Mingsheng Long

Figure 1 for MetaSets: Meta-Learning on Point Sets for Generalizable Representations

Figure 2 for MetaSets: Meta-Learning on Point Sets for Generalizable Representations

Figure 3 for MetaSets: Meta-Learning on Point Sets for Generalizable Representations

Figure 4 for MetaSets: Meta-Learning on Point Sets for Generalizable Representations

Abstract:Deep learning techniques for point clouds have achieved strong performance on a range of 3D vision tasks. However, it is costly to annotate large-scale point sets, making it critical to learn generalizable representations that can transfer well across different point sets. In this paper, we study a new problem of 3D Domain Generalization (3DDG) with the goal to generalize the model to other unseen domains of point clouds without any access to them in the training process. It is a challenging problem due to the substantial geometry shift from simulated to real data, such that most existing 3D models underperform due to overfitting the complete geometries in the source domain. We propose to tackle this problem via MetaSets, which meta-learns point cloud representations from a group of classification tasks on carefully-designed transformed point sets containing specific geometry priors. The learned representations are more generalizable to various unseen domains of different geometries. We design two benchmarks for Sim-to-Real transfer of 3D point clouds. Experimental results show that MetaSets outperforms existing 3D deep learning methods by large margins.

* 13 pages, CVPR 2021

Via

Access Paper or Ask Questions

Continual Predictive Learning from Videos

Apr 12, 2022

Geng Chen, Wendong Zhang, Han Lu, Siyu Gao, Yunbo Wang, Mingsheng Long, Xiaokang Yang

Figure 1 for Continual Predictive Learning from Videos

Figure 2 for Continual Predictive Learning from Videos

Figure 3 for Continual Predictive Learning from Videos

Figure 4 for Continual Predictive Learning from Videos

Abstract:Predictive learning ideally builds the world model of physical processes in one or more given environments. Typical setups assume that we can collect data from all environments at all times. In practice, however, different prediction tasks may arrive sequentially so that the environments may change persistently throughout the training procedure. Can we develop predictive learning algorithms that can deal with more realistic, non-stationary physical environments? In this paper, we study a new continual learning problem in the context of video prediction, and observe that most existing methods suffer from severe catastrophic forgetting in this setup. To tackle this problem, we propose the continual predictive learning (CPL) approach, which learns a mixture world model via predictive experience replay and performs test-time adaptation with non-parametric task inference. We construct two new benchmarks based on RoboNet and KTH, in which different tasks correspond to different physical robotic environments or human actions. Our approach is shown to effectively mitigate forgetting and remarkably outperform the na\"ive combinations of previous art in video prediction and continual learning.

* Accepted by CVPR 2022 (Oral). Code is available at https://github.com/jc043/CPL

Via

Access Paper or Ask Questions

A Roadmap for Big Model

Apr 02, 2022

Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He(+90 more)

Abstract:With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.

* arXiv admin note: text overlap with arXiv:2107.06499 by other authors

Via

Access Paper or Ask Questions

From Big to Small: Adaptive Learning to Partial-Set Domains

Mar 14, 2022

Zhangjie Cao, Kaichao You, Ziyang Zhang, Jianmin Wang, Mingsheng Long

Figure 1 for From Big to Small: Adaptive Learning to Partial-Set Domains

Figure 2 for From Big to Small: Adaptive Learning to Partial-Set Domains

Figure 3 for From Big to Small: Adaptive Learning to Partial-Set Domains

Figure 4 for From Big to Small: Adaptive Learning to Partial-Set Domains

Abstract:Domain adaptation targets at knowledge acquisition and dissemination from a labeled source domain to an unlabeled target domain under distribution shift. Still, the common requirement of identical class space shared across domains hinders applications of domain adaptation to partial-set domains. Recent advances show that deep pre-trained models of large scale endow rich knowledge to tackle diverse downstream tasks of small scale. Thus, there is a strong incentive to adapt models from large-scale domains to small-scale domains. This paper introduces Partial Domain Adaptation (PDA), a learning paradigm that relaxes the identical class space assumption to that the source class space subsumes the target class space. First, we present a theoretical analysis of partial domain adaptation, which uncovers the importance of estimating the transferable probability of each class and each instance across domains. Then, we propose Selective Adversarial Network (SAN and SAN++) with a bi-level selection strategy and an adversarial adaptation mechanism. The bi-level selection strategy up-weighs each class and each instance simultaneously for source supervised training, target self-training, and source-target adversarial adaptation through the transferable probability estimated alternately by the model. Experiments on standard partial-set datasets and more challenging tasks with superclasses show that SAN++ outperforms several domain adaptation methods.

* accepted to TPAMI in 2022

Via

Access Paper or Ask Questions

Debiased Pseudo Labeling in Self-Training

Feb 18, 2022

Baixu Chen, Junguang Jiang, Ximei Wang, Jianmin Wang, Mingsheng Long

Figure 1 for Debiased Pseudo Labeling in Self-Training

Figure 2 for Debiased Pseudo Labeling in Self-Training

Figure 3 for Debiased Pseudo Labeling in Self-Training

Figure 4 for Debiased Pseudo Labeling in Self-Training

Abstract:Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets. However, large-scale annotations are time-consuming and labor-exhaustive to obtain on realistic tasks. To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data. Despite its popularity, pseudo labeling is well-believed to be unreliable and often leads to training instability. Our experimental studies further reveal that the performance of self-training is biased due to data sampling, pre-trained models, and training strategies, especially the inappropriate utilization of pseudo labels. To this end, we propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads. To further improve the quality of pseudo labels, we introduce a worst-case estimation of pseudo labeling and seamlessly optimize the representations to avoid the worst-case. Extensive experiments justify that the proposed Debiased not only yields an average improvement of $14.4$\% against state-of-the-art algorithms on $11$ tasks (covering generic object recognition, fine-grained object recognition, texture classification, and scene classification) but also helps stabilize training and balance performance across classes.

Via

Access Paper or Ask Questions