Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mingli Song

Zhejiang University

Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?

May 27, 2023

Yihe Zhou, Shunyu Liu, Yunpeng Qing, Kaixuan Chen, Tongya Zheng, Yanhao Huang, Jie Song, Mingli Song

Figure 1 for Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?

Figure 2 for Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?

Figure 3 for Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?

Figure 4 for Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?

Abstract:Centralized Training with Decentralized Execution (CTDE) has recently emerged as a popular framework for cooperative Multi-Agent Reinforcement Learning (MARL), where agents can use additional global state information to guide training in a centralized way and make their own decisions only based on decentralized local policies. Despite the encouraging results achieved, CTDE makes an independence assumption on agent policies, which limits agents to adopt global cooperative information from each other during centralized training. Therefore, we argue that existing CTDE methods cannot fully utilize global information for training, leading to an inefficient joint-policy exploration and even suboptimal results. In this paper, we introduce a novel Centralized Advising and Decentralized Pruning (CADP) framework for multi-agent reinforcement learning, that not only enables an efficacious message exchange among agents during training but also guarantees the independent policies for execution. Firstly, CADP endows agents the explicit communication channel to seek and take advices from different agents for more centralized training. To further ensure the decentralized execution, we propose a smooth model pruning mechanism to progressively constraint the agent communication into a closed one without degradation in agent cooperation capability. Empirical evaluations on StarCraft II micromanagement and Google Research Football benchmarks demonstrate that the proposed framework achieves superior performance compared with the state-of-the-art counterparts. Our code will be made publicly available.

Via

Access Paper or Ask Questions

Non-stationary Online Convex Optimization with Arbitrary Delays

May 20, 2023

Yuanyu Wan, Chang Yao, Mingli Song, Lijun Zhang

Abstract:Online convex optimization (OCO) with arbitrary delays, in which gradients or other information of functions could be arbitrarily delayed, has received increasing attention recently. Different from previous studies that focus on stationary environments, this paper investigates the delayed OCO in non-stationary environments, and aims to minimize the dynamic regret with respect to any sequence of comparators. To this end, we first propose a simple algorithm, namely DOGD, which performs a gradient descent step for each delayed gradient according to their arrival order. Despite its simplicity, our novel analysis shows that DOGD can attain an $O(\sqrt{dT}(P_T+1)$ dynamic regret bound in the worst case, where $d$ is the maximum delay, $T$ is the time horizon, and $P_T$ is the path length of comparators. More importantly, in case delays do not change the arrival order of gradients, it can automatically reduce the dynamic regret to $O(\sqrt{S}(1+P_T))$, where $S$ is the sum of delays. Furthermore, we develop an improved algorithm, which can reduce those dynamic regret bounds achieved by DOGD to $O(\sqrt{dT(P_T+1)})$ and $O(\sqrt{S(1+P_T)})$, respectively. The essential idea is to run multiple DOGD with different learning rates, and utilize a meta-algorithm to track the best one based on their delayed performance. Finally, we demonstrate that our improved algorithm is optimal in both cases by deriving a matching lower bound.

Via

Access Paper or Ask Questions

ViT-Calibrator: Decision Stream Calibration for Vision Transformer

May 05, 2023

Lin Chen, Zhijie Jia, Tian Qiu, Lechao Cheng, Jie Lei, Zunlei Feng, Mingli Song

Abstract:A surge of interest has emerged in utilizing Transformers in diverse vision tasks owing to its formidable performance. However, existing approaches primarily focus on optimizing internal model architecture designs that often entail significant trial and error with high burdens. In this work, we propose a new paradigm dubbed Decision Stream Calibration that boosts the performance of general Vision Transformers. To achieve this, we shed light on the information propagation mechanism in the learning procedure by exploring the correlation between different tokens and the relevance coefficient of multiple dimensions. Upon further analysis, it was discovered that 1) the final decision is associated with tokens of foreground targets, while token features of foreground target will be transmitted into the next layer as much as possible, and the useless token features of background area will be eliminated gradually in the forward propagation. 2) Each category is solely associated with specific sparse dimensions in the tokens. Based on the discoveries mentioned above, we designed a two-stage calibration scheme, namely ViT-Calibrator, including token propagation calibration stage and dimension propagation calibration stage. Extensive experiments on commonly used datasets show that the proposed approach can achieve promising results. The source codes are given in the supplements.

* At present, the paper involves internal projects of the company, and it is not convenient to publish it temporarily, so the article needs to be withdrawn temporarily

Via

Access Paper or Ask Questions

Transition Propagation Graph Neural Networks for Temporal Networks

Apr 15, 2023

Tongya Zheng, Zunlei Feng, Tianli Zhang, Yunzhi Hao, Mingli Song, Xingen Wang, Xinyu Wang, Ji Zhao, Chun Chen

Figure 1 for Transition Propagation Graph Neural Networks for Temporal Networks

Figure 2 for Transition Propagation Graph Neural Networks for Temporal Networks

Figure 3 for Transition Propagation Graph Neural Networks for Temporal Networks

Figure 4 for Transition Propagation Graph Neural Networks for Temporal Networks

Abstract:Researchers of temporal networks (e.g., social networks and transaction networks) have been interested in mining dynamic patterns of nodes from their diverse interactions. Inspired by recently powerful graph mining methods like skip-gram models and Graph Neural Networks (GNNs), existing approaches focus on generating temporal node embeddings sequentially with nodes' sequential interactions. However, the sequential modeling of previous approaches cannot handle the transition structure between nodes' neighbors with limited memorization capacity. Detailedly, an effective method for the transition structures is required to both model nodes' personalized patterns adaptively and capture node dynamics accordingly. In this paper, we propose a method, namely Transition Propagation Graph Neural Networks (TIP-GNN), to tackle the challenges of encoding nodes' transition structures. The proposed TIP-GNN focuses on the bilevel graph structure in temporal networks: besides the explicit interaction graph, a node's sequential interactions can also be constructed as a transition graph. Based on the bilevel graph, TIP-GNN further encodes transition structures by multi-step transition propagation and distills information from neighborhoods by a bilevel graph convolution. Experimental results over various temporal networks reveal the efficiency of our TIP-GNN, with at most 7.2\% improvements of accuracy on temporal link prediction. Extensive ablation studies further verify the effectiveness and limitations of the transition propagation module. Our code is available at \url{https://github.com/doujiang-zheng/TIP-GNN}.

* Published by IEEE Transactions on Neural Networks and Learning Systems, 2022

Via

Access Paper or Ask Questions

Temporal Aggregation and Propagation Graph Neural Networks for Dynamic Representation

Apr 15, 2023

Tongya Zheng, Xinchao Wang, Zunlei Feng, Jie Song, Yunzhi Hao, Mingli Song, Xingen Wang, Xinyu Wang, Chun Chen

Figure 1 for Temporal Aggregation and Propagation Graph Neural Networks for Dynamic Representation

Figure 2 for Temporal Aggregation and Propagation Graph Neural Networks for Dynamic Representation

Figure 3 for Temporal Aggregation and Propagation Graph Neural Networks for Dynamic Representation

Figure 4 for Temporal Aggregation and Propagation Graph Neural Networks for Dynamic Representation

Abstract:Temporal graphs exhibit dynamic interactions between nodes over continuous time, whose topologies evolve with time elapsing. The whole temporal neighborhood of nodes reveals the varying preferences of nodes. However, previous works usually generate dynamic representation with limited neighbors for simplicity, which results in both inferior performance and high latency of online inference. Therefore, in this paper, we propose a novel method of temporal graph convolution with the whole neighborhood, namely Temporal Aggregation and Propagation Graph Neural Networks (TAP-GNN). Specifically, we firstly analyze the computational complexity of the dynamic representation problem by unfolding the temporal graph in a message-passing paradigm. The expensive complexity motivates us to design the AP (aggregation and propagation) block, which significantly reduces the repeated computation of historical neighbors. The final TAP-GNN supports online inference in the graph stream scenario, which incorporates the temporal information into node embeddings with a temporal activation function and a projection layer besides several AP blocks. Experimental results on various real-life temporal networks show that our proposed TAP-GNN outperforms existing temporal graph methods by a large margin in terms of both predictive performance and online inference latency. Our code is available at \url{https://github.com/doujiang-zheng/TAP-GNN}.

* Published by IEEE Transactions on Knowledge and Data Engineering, April 2023. We have updated the latest experimental results of baselines

Via

Access Paper or Ask Questions

Life Regression based Patch Slimming for Vision Transformers

Apr 11, 2023

Jiawei Chen, Lin Chen, Jiang Yang, Tianqi Shi, Lechao Cheng, Zunlei Feng, Mingli Song

Abstract:Vision transformers have achieved remarkable success in computer vision tasks by using multi-head self-attention modules to capture long-range dependencies within images. However, the high inference computation cost poses a new challenge. Several methods have been proposed to address this problem, mainly by slimming patches. In the inference stage, these methods classify patches into two classes, one to keep and the other to discard in multiple layers. This approach results in additional computation at every layer where patches are discarded, which hinders inference acceleration. In this study, we tackle the patch slimming problem from a different perspective by proposing a life regression module that determines the lifespan of each image patch in one go. During inference, the patch is discarded once the current layer index exceeds its life. Our proposed method avoids additional computation and parameters in multiple layers to enhance inference speed while maintaining competitive performance. Additionally, our approach requires fewer training epochs than other patch slimming methods.

* 8 pages,4 figures

Via

Access Paper or Ask Questions

Propheter: Prophetic Teacher Guided Long-Tailed Distribution Learning

Apr 09, 2023

Wenxiang Xu, Linyun Zhou, Lin Chen, Lechao Cheng, Jie Lei, Zunlei Feng, Mingli Song

Abstract:The problem of deep long-tailed learning, a prevalent challenge in the realm of generic visual recognition, persists in a multitude of real-world applications. To tackle the heavily-skewed dataset issue in long-tailed classification, prior efforts have sought to augment existing deep models with the elaborate class-balancing strategies, such as class rebalancing, data augmentation, and module improvement. Despite the encouraging performance, the limited class knowledge of the tailed classes in the training dataset still bottlenecks the performance of the existing deep models. In this paper, we propose an innovative long-tailed learning paradigm that breaks the bottleneck by guiding the learning of deep networks with external prior knowledge. This is specifically achieved by devising an elaborated ``prophetic'' teacher, termed as ``Propheter'', that aims to learn the potential class distributions. The target long-tailed prediction model is then optimized under the instruction of the well-trained ``Propheter'', such that the distributions of different classes are as distinguishable as possible from each other. Experiments on eight long-tailed benchmarks across three architectures demonstrate that the proposed prophetic paradigm acts as a promising solution to the challenge of limited class knowledge in long-tailed datasets. Our code and model can be found in the supplementary material.

Via

Access Paper or Ask Questions

Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation

Mar 26, 2023

Tianli Zhang, Mengqi Xue, Jiangtao Zhang, Haofei Zhang, Yu Wang, Lechao Cheng, Jie Song, Mingli Song

Figure 1 for Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation

Figure 2 for Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation

Figure 3 for Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation

Figure 4 for Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation

Abstract:Most existing online knowledge distillation(OKD) techniques typically require sophisticated modules to produce diverse knowledge for improving students' generalization ability. In this paper, we strive to fully utilize multi-model settings instead of well-designed modules to achieve a distillation effect with excellent generalization performance. Generally, model generalization can be reflected in the flatness of the loss landscape. Since averaging parameters of multiple models can find flatter minima, we are inspired to extend the process to the sampled convex combinations of multi-student models in OKD. Specifically, by linearly weighting students' parameters in each training batch, we construct a Hybrid-Weight Model(HWM) to represent the parameters surrounding involved students. The supervision loss of HWM can estimate the landscape's curvature of the whole region around students to measure the generalization explicitly. Hence we integrate HWM's loss into students' training and propose a novel OKD framework via parameter hybridization(OKDPH) to promote flatter minima and obtain robust solutions. Considering the redundancy of parameters could lead to the collapse of HWM, we further introduce a fusion operation to keep the high similarity of students. Compared to the state-of-the-art(SOTA) OKD methods and SOTA methods of seeking flat minima, our OKDPH achieves higher performance with fewer parameters, benefiting OKD with lightweight and robust characteristics. Our code is publicly available at https://github.com/tianlizhang/OKDPH.

* Accepted by cvpr2023

Via

Access Paper or Ask Questions

Schema Inference for Interpretable Image Classification

Mar 12, 2023

Haofei Zhang, Mengqi Xue, Xiaokang Liu, Kaixuan Chen, Jie Song, Mingli Song

Abstract:In this paper, we study a novel inference paradigm, termed as schema inference, that learns to deductively infer the explainable predictions by rebuilding the prior deep neural network (DNN) forwarding scheme, guided by the prevalent philosophical cognitive concept of schema. We strive to reformulate the conventional model inference pipeline into a graph matching policy that associates the extracted visual concepts of an image with the pre-computed scene impression, by analogy with human reasoning mechanism via impression matching. To this end, we devise an elaborated architecture, termed as SchemaNet, as a dedicated instantiation of the proposed schema inference concept, that models both the visual semantics of input instances and the learned abstract imaginations of target categories as topological relational graphs. Meanwhile, to capture and leverage the compositional contributions of visual semantics in a global view, we also introduce a universal Feat2Graph scheme in SchemaNet to establish the relational graphs that contain abundant interaction information. Both the theoretical analysis and the experimental results on several benchmarks demonstrate that the proposed schema inference achieves encouraging performance and meanwhile yields a clear picture of the deductive process leading to the predictions. Our code is available at https://github.com/zhfeing/SchemaNet-PyTorch.

Via

Access Paper or Ask Questions

Team DETR: Guide Queries as a Professional Team in Detection Transformers

Feb 28, 2023

Tian Qiu, Linyun Zhou, Wenxiang Xu, Lechao Cheng, Zunlei Feng, Mingli Song

Figure 1 for Team DETR: Guide Queries as a Professional Team in Detection Transformers

Figure 2 for Team DETR: Guide Queries as a Professional Team in Detection Transformers

Figure 3 for Team DETR: Guide Queries as a Professional Team in Detection Transformers

Figure 4 for Team DETR: Guide Queries as a Professional Team in Detection Transformers

Abstract:Recent proposed DETR variants have made tremendous progress in various scenarios due to their streamlined processes and remarkable performance. However, the learned queries usually explore the global context to generate the final set prediction, resulting in redundant burdens and unfaithful results. More specifically, a query is commonly responsible for objects of different scales and positions, which is a challenge for the query itself, and will cause spatial resource competition among queries. To alleviate this issue, we propose Team DETR, which leverages query collaboration and position constraints to embrace objects of interest more precisely. We also dynamically cater to each query member's prediction preference, offering the query better scale and spatial priors. In addition, the proposed Team DETR is flexible enough to be adapted to other existing DETR variants without increasing parameters and calculations. Extensive experiments on the COCO dataset show that Team DETR achieves remarkable gains, especially for small and large objects. Code is available at \url{https://github.com/horrible-dong/TeamDETR}.

Via

Access Paper or Ask Questions