Deep Spiking Neural Networks (SNNs) are harder to train than ANNs because of their discrete binary activation and spatio-temporal domain error back-propagation. Considering the huge success of ResNet in ANNs' deep learning, it is natural to attempt to use residual learning to train deep SNNs. Previous Spiking ResNet used a similar residual block to the standard block of ResNet, which we regard as inadequate for SNNs and which still causes the degradation problem. In this paper, we propose the spike-element-wise (SEW) residual block and prove that it can easily implement the residual learning. We evaluate our SEW ResNet on ImageNet. The experiment results show that the SEW ResNet can obtain higher performance by simply adding more layers, providing a simple method to train deep SNNs.
One of the biggest challenges in multi-agent reinforcement learning is coordination, a typical application scenario of this is traffic signal control. Recently, it has attracted a rising number of researchers and has become a hot research field with great practical significance. In this paper, we propose a novel method called MetaVRS~(Meta Variational RewardShaping) for traffic signal coordination control. By heuristically applying the intrinsic reward to the environmental reward, MetaVRS can wisely capture the agent-to-agent interplay. Besides, latent variables generated by VAE are brought into policy for automatically tradeoff between exploration and exploitation to optimize the policy. In addition, meta learning was used in decoder for faster adaptation and better approximation. Empirically, we demonstate that MetaVRS substantially outperforms existing methods and shows superior adaptability, which predictably has a far-reaching significance to the multi-agent traffic signal coordination control.
The goal of traffic signal control is to coordinate multiple traffic signals to improve the traffic efficiency of a district or a city. In this work, we propose a novel Meta Variationally Intrinsic Motivated (MetaVIM) RL method, and aim to learn the decentralized polices of each traffic signal only conditioned on its local observation. MetaVIM makes three novel contributions. Firstly, to make the model available to new unseen target scenarios, we formulate the traffic signal control as a meta-learning problem over a set of related tasks. The train scenario is divided as multiple partially observable Markov decision process (POMDP) tasks, and each task corresponds to a traffic light. In each task, the neighbours are regarded as an unobserved part of the state. Secondly, we assume that the reward, transition and policy functions vary across different tasks but share a common structure, and a learned latent variable conditioned on the past trajectories is proposed for each task to represent the specific information of the current task in these functions, then is further brought into policy for automatically trade off between exploration and exploitation to induce the RL agent to choose the reasonable action. In addition, to make the policy learning stable, four decoders are introduced to predict the received observations and rewards of the current agent with/without neighbour agents' policies, and a novel intrinsic reward is designed to encourage the received observation and reward invariant to the neighbour agents. Empirically, extensive experiments conducted on CityFlow demonstrate that the proposed method substantially outperforms existing methods and shows superior generalizability.
Online image hashing has received increasing research attention recently, which processes large-scale data in a streaming fashion to update the hash functions on-the-fly. To this end, most existing works exploit this problem under a supervised setting, i.e., using class labels to boost the hashing performance, which suffers from the defects in both adaptivity and efficiency: First, large amounts of training batches are required to learn up-to-date hash functions, which leads to poor online adaptivity. Second, the training is time-consuming, which contradicts with the core need of online learning. In this paper, a novel supervised online hashing scheme, termed Fast Class-wise Updating for Online Hashing (FCOH), is proposed to address the above two challenges by introducing a novel and efficient inner product operation. To achieve fast online adaptivity, a class-wise updating method is developed to decompose the binary code learning and alternatively renew the hash functions in a class-wise fashion, which well addresses the burden on large amounts of training batches. Quantitatively, such a decomposition further leads to at least 75% storage saving. To further achieve online efficiency, we propose a semi-relaxation optimization, which accelerates the online training by treating different binary constraints independently. Without additional constraints and variables, the time complexity is significantly reduced. Such a scheme is also quantitatively shown to well preserve past information during updating hashing functions. We have quantitatively demonstrated that the collective effort of class-wise updating and semi-relaxation optimization provides a superior performance comparing to various state-of-the-art methods, which is verified through extensive experiments on three widely-used datasets.
Deep learning has achieved great success in recognizing video actions, but the collection and annotation of training data are still laborious, which mainly lies in two aspects: (1) the amount of required annotated data is large; (2) temporally annotating the location of each action is time-consuming. Works such as few-shot learning or untrimmed video recognition have been proposed to handle either one aspect or the other. However, very few existing works can handle both aspects simultaneously. In this paper, we target a new problem, Annotation-Efficient Video Recognition, to reduce the requirement of annotations for both large amount of samples from different classes and the action locations. Challenges of this problem come from three folds: (1) action recognition from untrimmed videos, (2) weak supervision, and (3) novel classes with only a few training samples. To address the first two challenges, we propose a background pseudo-labeling method based on open-set detection. To tackle the third challenge, we propose a self-weighted classification mechanism and a contrastive learning method to separate background and foreground of the untrimmed videos. Extensive experiments on ActivityNet v1.2 and ActivityNet v1.3 verify the effectiveness of the proposed methods. Codes will be released online.
Open set recognition is an emerging research area that aims to simultaneously classify samples from predefined classes and identify the rest as 'unknown'. In this process, one of the key challenges is to reduce the risk of generalizing the inherent characteristics of numerous unknown samples learned from a small amount of known data. In this paper, we propose a new concept, Reciprocal Point, which is the potential representation of the extra-class space corresponding to each known category. The sample can be classified to known or unknown by the otherness with reciprocal points. To tackle the open set problem, we offer a novel open space risk regularization term. Based on the bounded space constructed by reciprocal points, the risk of unknown is reduced through multi-category interaction. The novel learning framework called Reciprocal Point Learning (RPL), which can indirectly introduce the unknown information into the learner with only known classes, so as to learn more compact and discriminative representations. Moreover, we further construct a new large-scale challenging aircraft dataset for open set recognition: Aircraft 300 (Air-300). Extensive experiments on multiple benchmark datasets indicate that our framework is significantly superior to other existing approaches and achieves state-of-the-art performance on standard open set benchmarks.
The small objects in images and videos are usually not independent individuals. Instead, they more or less present some semantic and spatial layout relationships with each other. Modeling and inferring such intrinsic relationships can thereby be beneficial for small object detection. In this paper, we propose a novel context reasoning approach for small object detection which models and infers the intrinsic semantic and spatial layout relationships between objects. Specifically, we first construct a semantic module to model the sparse semantic relationships based on the initial regional features, and a spatial layout module to model the sparse spatial layout relationships based on their position and shape information, respectively. Both of them are then fed into a context reasoning module for integrating the contextual information with respect to the objects and their relationships, which is further fused with the original regional visual features for classification and regression. Experimental results reveal that the proposed approach can effectively boost the small object detection performance.
Given base classes with sufficient labeled samples, the target of few-shot classification is to recognize unlabeled samples of novel classes with only a few labeled samples. Most existing methods only pay attention to the relationship between labeled and unlabeled samples of novel classes, which do not make full use of information within base classes. In this paper, we make two contributions to investigate the few-shot classification problem. First, we report a simple and effective baseline trained on base classes in the way of traditional supervised learning, which can achieve comparable results to the state of the art. Second, based on the baseline, we propose a cooperative bi-path metric for classification, which leverages the correlations between base classes and novel classes to further improve the accuracy. Experiments on two widely used benchmarks show that our method is a simple and effective framework, and a new state of the art is established in the few-shot classification field.
Cross-domain few-shot learning (FSL) is proposed recently to transfer knowledge from general-domain known classes (e.g., ImageNet) to novel classes in other domains, and recognize novel classes with only few training samples. In this paper, we go further to define a more challenging scenario that transfers knowledge from general-domain known classes to novel classes in distant domains which are significantly different from the general domain, e.g., medical data. To solve this challenging problem, we propose to exploit mid-level features, which are more transferable, yet under-explored in recent main-stream FSL works. To boost the discriminability of mid-level features, we propose a residual-prediction task for the training on known classes. In this task, we view the current training sample as a sample from a pseudo-novel class, so as to provide simulated novel-class data. However, this simulated data is from the same domain as known classes, and shares high-level patterns with other known classes. Therefore, we then use high-level patterns from other known classes to represent it and remove this high-level representation from the simulated data, outputting a residual term containing discriminative information of it that could not be represented by high-level patterns from other known classes. Then, mid-level features from multiple mid-layers are dynamically weighted to predict this residual term, which encourages the mid-level features to be discriminative. Notably, our method can be applied to both the regular in-domain FSL setting by emphasizing high-level & transformed mid-level features and the distant-domain FSL setting by emphasizing mid-level features. Experiments under both settings on six public datasets (including two challenging medical datasets) validate the rationale of the proposed method, demonstrating state-of-the-art performance on both settings.
Often the best performing deep neural models are ensembles of multiple base-level networks, nevertheless, ensemble learning with respect to domain adaptive person re-ID remains unexplored. In this paper, we propose a multiple expert brainstorming network (MEB-Net) for domain adaptive person re-ID, opening up a promising direction about model ensemble problem under unsupervised conditions. MEB-Net adopts a mutual learning strategy, where multiple networks with different architectures are pre-trained within a source domain as expert models equipped with specific features and knowledge, while the adaptation is then accomplished through brainstorming (mutual learning) among expert models. MEB-Net accommodates the heterogeneity of experts learned with different architectures and enhances discrimination capability of the adapted re-ID model, by introducing a regularization scheme about authority of experts. Extensive experiments on large-scale datasets (Market-1501 and DukeMTMC-reID) demonstrate the superior performance of MEB-Net over the state-of-the-arts.