Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yonghong Tian

Pengcheng Laboratory, Peking University

Spike-driven Transformer

Jul 04, 2023

Man Yao, Jiakui Hu, Zhaokun Zhou, Li Yuan, Yonghong Tian, Bo Xu, Guoqi Li

Abstract:Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option due to their unique spike-based event-driven (i.e., spike-driven) paradigm. In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties: 1) Event-driven, no calculation is triggered when the input of Transformer is zero; 2) Binary spike communication, all matrix multiplications associated with the spike matrix can be transformed into sparse additions; 3) Self-attention with linear complexity at both token and channel dimensions; 4) The operations between spike-form Query, Key, and Value are mask and addition. Together, there are only sparse addition operations in the Spike-driven Transformer. To this end, we design a novel Spike-Driven Self-Attention (SDSA), which exploits only mask and addition operations without any multiplication, and thus having up to $87.2\times$ lower computation energy than vanilla self-attention. Especially in SDSA, the matrix multiplication between Query, Key, and Value is designed as the mask operation. In addition, we rearrange all residual connections in the vanilla Transformer before the activation functions to ensure that all neurons transmit binary spike signals. It is shown that the Spike-driven Transformer can achieve 77.1\% top-1 accuracy on ImageNet-1K, which is the state-of-the-art result in the SNN field. The source code is available at https://github.com/BICLab/Spike-Driven-Transformer.

Via

Access Paper or Ask Questions

Dual Adaptive Representation Alignment for Cross-domain Few-shot Learning

Jun 18, 2023

Yifan Zhao, Tong Zhang, Jia Li, Yonghong Tian

Figure 1 for Dual Adaptive Representation Alignment for Cross-domain Few-shot Learning

Figure 2 for Dual Adaptive Representation Alignment for Cross-domain Few-shot Learning

Figure 3 for Dual Adaptive Representation Alignment for Cross-domain Few-shot Learning

Figure 4 for Dual Adaptive Representation Alignment for Cross-domain Few-shot Learning

Abstract:Few-shot learning aims to recognize novel queries with limited support samples by learning from base knowledge. Recent progress in this setting assumes that the base knowledge and novel query samples are distributed in the same domains, which are usually infeasible for realistic applications. Toward this issue, we propose to address the cross-domain few-shot learning problem where only extremely few samples are available in target domains. Under this realistic setting, we focus on the fast adaptation capability of meta-learners by proposing an effective dual adaptive representation alignment approach. In our approach, a prototypical feature alignment is first proposed to recalibrate support instances as prototypes and reproject these prototypes with a differentiable closed-form solution. Therefore feature spaces of learned knowledge can be adaptively transformed to query spaces by the cross-instance and cross-prototype relations. Besides the feature alignment, we further present a normalized distribution alignment module, which exploits prior statistics of query samples for solving the covariant shifts among the support and query samples. With these two modules, a progressive meta-learning framework is constructed to perform the fast adaptation with extremely few-shot samples while maintaining its generalization capabilities. Experimental evidence demonstrates our approach achieves new state-of-the-art results on 4 CDFSL benchmarks and 4 fine-grained cross-domain benchmarks.

* 13 pages; Accepted by IEEE T-PAMI

Via

Access Paper or Ask Questions

Population-Based Evolutionary Gaming for Unsupervised Person Re-identification

Jun 08, 2023

Yunpeng Zhai, Peixi Peng, Mengxi Jia, Shiyong Li, Weiqiang Chen, Xuesong Gao, Yonghong Tian

Abstract:Unsupervised person re-identification has achieved great success through the self-improvement of individual neural networks. However, limited by the lack of diversity of discriminant information, a single network has difficulty learning sufficient discrimination ability by itself under unsupervised conditions. To address this limit, we develop a population-based evolutionary gaming (PEG) framework in which a population of diverse neural networks is trained concurrently through selection, reproduction, mutation, and population mutual learning iteratively. Specifically, the selection of networks to preserve is modeled as a cooperative game and solved by the best-response dynamics, then the reproduction and mutation are implemented by cloning and fluctuating hyper-parameters of networks to learn more diversity, and population mutual learning improves the discrimination of networks by knowledge distillation from each other within the population. In addition, we propose a cross-reference scatter (CRS) to approximately evaluate re-ID models without labeled samples and adopt it as the criterion of network selection in PEG. CRS measures a model's performance by indirectly estimating the accuracy of its predicted pseudo-labels according to the cohesion and separation of the feature space. Extensive experiments demonstrate that (1) CRS approximately measures the performance of models without labeled samples; (2) and PEG produces new state-of-the-art accuracy for person re-identification, indicating the great potential of population-based network cooperative training for unsupervised learning.

* Accepted in IJCV

Via

Access Paper or Ask Questions

Deep recurrent spiking neural networks capture both static and dynamic representations of the visual cortex under movie stimuli

Jun 02, 2023

Liwei Huang, ZhengYu Ma, Huihui Zhou, Yonghong Tian

Figure 1 for Deep recurrent spiking neural networks capture both static and dynamic representations of the visual cortex under movie stimuli

Figure 2 for Deep recurrent spiking neural networks capture both static and dynamic representations of the visual cortex under movie stimuli

Figure 3 for Deep recurrent spiking neural networks capture both static and dynamic representations of the visual cortex under movie stimuli

Figure 4 for Deep recurrent spiking neural networks capture both static and dynamic representations of the visual cortex under movie stimuli

Abstract:In the real world, visual stimuli received by the biological visual system are predominantly dynamic rather than static. A better understanding of how the visual cortex represents movie stimuli could provide deeper insight into the information processing mechanisms of the visual system. Although some progress has been made in modeling neural responses to natural movies with deep neural networks, the visual representations of static and dynamic information under such time-series visual stimuli remain to be further explored. In this work, considering abundant recurrent connections in the mouse visual system, we design a recurrent module based on the hierarchy of the mouse cortex and add it into Deep Spiking Neural Networks, which have been demonstrated to be a more compelling computational model for the visual cortex. Using Time-Series Representational Similarity Analysis, we measure the representational similarity between networks and mouse cortical regions under natural movie stimuli. Subsequently, we conduct a comparison of the representational similarity across recurrent/feedforward networks and image/video training tasks. Trained on the video action recognition task, recurrent SNN achieves the highest representational similarity and significantly outperforms feedforward SNN trained on the same task by 15% and the recurrent SNN trained on the image classification task by 8%. We investigate how static and dynamic representations of SNNs influence the similarity, as a way to explain the importance of these two forms of representations in biological neural coding. Taken together, our work is the first to apply deep recurrent SNNs to model the mouse visual cortex under movie stimuli and we establish that these networks are competent to capture both static and dynamic representations and make contributions to understanding the movie information processing mechanisms of the visual cortex.

Via

Access Paper or Ask Questions

Auto-Spikformer: Spikformer Architecture Search

Jun 01, 2023

Kaiwei Che, Zhaokun Zhou, Zhengyu Ma, Wei Fang, Yanqi Chen, Shuaijie Shen, Li Yuan, Yonghong Tian

Abstract:The integration of self-attention mechanisms into Spiking Neural Networks (SNNs) has garnered considerable interest in the realm of advanced deep learning, primarily due to their biological properties. Recent advancements in SNN architecture, such as Spikformer, have demonstrated promising outcomes by leveraging Spiking Self-Attention (SSA) and Spiking Patch Splitting (SPS) modules. However, we observe that Spikformer may exhibit excessive energy consumption, potentially attributable to redundant channels and blocks. To mitigate this issue, we propose Auto-Spikformer, a one-shot Transformer Architecture Search (TAS) method, which automates the quest for an optimized Spikformer architecture. To facilitate the search process, we propose methods Evolutionary SNN neurons (ESNN), which optimizes the SNN parameters, and apply the previous method of weight entanglement supernet training, which optimizes the Vision Transformer (ViT) parameters. Moreover, we propose an accuracy and energy balanced fitness function $\mathcal{F}_{AEB}$ that jointly considers both energy consumption and accuracy, and aims to find a Pareto optimal combination that balances these two objectives. Our experimental results demonstrate the effectiveness of Auto-Spikformer, which outperforms the state-of-the-art method including CNN or ViT models that are manually or automatically designed while significantly reducing energy consumption.

Via

Access Paper or Ask Questions

Album Storytelling with Iterative Story-aware Captioning and Large Language Models

May 24, 2023

Munan Ning, Yujia Xie, Dongdong Chen, Zeyin Song, Lu Yuan, Yonghong Tian, Qixiang Ye, Li Yuan

Figure 1 for Album Storytelling with Iterative Story-aware Captioning and Large Language Models

Figure 2 for Album Storytelling with Iterative Story-aware Captioning and Large Language Models

Figure 3 for Album Storytelling with Iterative Story-aware Captioning and Large Language Models

Figure 4 for Album Storytelling with Iterative Story-aware Captioning and Large Language Models

Abstract:This work studies how to transform an album to vivid and coherent stories, a task we refer to as "album storytelling". While this task can help preserve memories and facilitate experience sharing, it remains an underexplored area in current literature. With recent advances in Large Language Models (LLMs), it is now possible to generate lengthy, coherent text, opening up the opportunity to develop an AI assistant for album storytelling. One natural approach is to use caption models to describe each photo in the album, and then use LLMs to summarize and rewrite the generated captions into an engaging story. However, we find this often results in stories containing hallucinated information that contradicts the images, as each generated caption ("story-agnostic") is not always about the description related to the whole story or miss some necessary information. To address these limitations, we propose a new iterative album storytelling pipeline. Specifically, we start with an initial story and build a story-aware caption model to refine the captions using the whole story as guidance. The polished captions are then fed into the LLMs to generate a new refined story. This process is repeated iteratively until the story contains minimal factual errors while maintaining coherence. To evaluate our proposed pipeline, we introduce a new dataset of image collections from vlogs and a set of systematic evaluation metrics. Our results demonstrate that our method effectively generates more accurate and engaging stories for albums, with enhanced coherence and vividness.

Via

Access Paper or Ask Questions

Temporal Contrastive Learning for Spiking Neural Networks

May 23, 2023

Haonan Qiu, Zeyin Song, Yanqi Chen, Munan Ning, Wei Fang, Tao Sun, Zhengyu Ma, Li Yuan, Yonghong Tian

Figure 1 for Temporal Contrastive Learning for Spiking Neural Networks

Figure 2 for Temporal Contrastive Learning for Spiking Neural Networks

Figure 3 for Temporal Contrastive Learning for Spiking Neural Networks

Figure 4 for Temporal Contrastive Learning for Spiking Neural Networks

Abstract:Biologically inspired spiking neural networks (SNNs) have garnered considerable attention due to their low-energy consumption and spatio-temporal information processing capabilities. Most existing SNNs training methods first integrate output information across time steps, then adopt the cross-entropy (CE) loss to supervise the prediction of the average representations. However, in this work, we find the method above is not ideal for the SNNs training as it omits the temporal dynamics of SNNs and degrades the performance quickly with the decrease of inference time steps. One tempting method to model temporal correlations is to apply the same label supervision at each time step and treat them identically. Although it can acquire relatively consistent performance across various time steps, it still faces challenges in obtaining SNNs with high performance. Inspired by these observations, we propose Temporal-domain supervised Contrastive Learning (TCL) framework, a novel method to obtain SNNs with low latency and high performance by incorporating contrastive supervision with temporal domain information. Contrastive learning (CL) prompts the network to discern both consistency and variability in the representation space, enabling it to better learn discriminative and generalizable features. We extend this concept to the temporal domain of SNNs, allowing us to flexibly and fully leverage the correlation between representations at different time steps. Furthermore, we propose a Siamese Temporal-domain supervised Contrastive Learning (STCL) framework to enhance the SNNs via augmentation, temporal and class constraints simultaneously. Extensive experimental results demonstrate that SNNs trained by our TCL and STCL can achieve both high performance and low latency, achieving state-of-the-art performance on a variety of datasets (e.g., CIFAR-10, CIFAR-100, and DVS-CIFAR10).

Via

Access Paper or Ask Questions

Probabilistic Modeling: Proving the Lottery Ticket Hypothesis in Spiking Neural Network

May 20, 2023

Man Yao, Yuhong Chou, Guangshe Zhao, Xiawu Zheng, Yonghong Tian, Bo Xu, Guoqi Li

Figure 1 for Probabilistic Modeling: Proving the Lottery Ticket Hypothesis in Spiking Neural Network

Figure 2 for Probabilistic Modeling: Proving the Lottery Ticket Hypothesis in Spiking Neural Network

Figure 3 for Probabilistic Modeling: Proving the Lottery Ticket Hypothesis in Spiking Neural Network

Figure 4 for Probabilistic Modeling: Proving the Lottery Ticket Hypothesis in Spiking Neural Network

Abstract:The Lottery Ticket Hypothesis (LTH) states that a randomly-initialized large neural network contains a small sub-network (i.e., winning tickets) which, when trained in isolation, can achieve comparable performance to the large network. LTH opens up a new path for network pruning. Existing proofs of LTH in Artificial Neural Networks (ANNs) are based on continuous activation functions, such as ReLU, which satisfying the Lipschitz condition. However, these theoretical methods are not applicable in Spiking Neural Networks (SNNs) due to the discontinuous of spiking function. We argue that it is possible to extend the scope of LTH by eliminating Lipschitz condition. Specifically, we propose a novel probabilistic modeling approach for spiking neurons with complicated spatio-temporal dynamics. Then we theoretically and experimentally prove that LTH holds in SNNs. According to our theorem, we conclude that pruning directly in accordance with the weight size in existing SNNs is clearly not optimal. We further design a new criterion for pruning based on our theory, which achieves better pruning results than baseline.

* 22pages, 5 figures

Via

Access Paper or Ask Questions

Enhancing the Performance of Transformer-based Spiking Neural Networks by SNN-optimized Downsampling with Precise Gradient Backpropagation

May 19, 2023

Chenlin Zhou, Han Zhang, Zhaokun Zhou, Liutao Yu, Zhengyu Ma, Huihui Zhou, Xiaopeng Fan, Yonghong Tian

Figure 1 for Enhancing the Performance of Transformer-based Spiking Neural Networks by SNN-optimized Downsampling with Precise Gradient Backpropagation

Figure 2 for Enhancing the Performance of Transformer-based Spiking Neural Networks by SNN-optimized Downsampling with Precise Gradient Backpropagation

Figure 3 for Enhancing the Performance of Transformer-based Spiking Neural Networks by SNN-optimized Downsampling with Precise Gradient Backpropagation

Figure 4 for Enhancing the Performance of Transformer-based Spiking Neural Networks by SNN-optimized Downsampling with Precise Gradient Backpropagation

Abstract:Deep spiking neural networks (SNNs) have drawn much attention in recent years because of their low power consumption, biological rationality and event-driven property. However, state-of-the-art deep SNNs (including Spikformer and Spikingformer) suffer from a critical challenge related to the imprecise gradient backpropagation. This problem arises from the improper design of downsampling modules in these networks, and greatly hampering the overall model performance. In this paper, we propose ConvBN-MaxPooling-LIF (CML), an SNN-optimized downsampling with precise gradient backpropagation. We prove that CML can effectively overcome the imprecision of gradient backpropagation from a theoretical perspective. In addition, we evaluate CML on ImageNet, CIFAR10, CIFAR100, CIFAR10-DVS, DVS128-Gesture datasets, and show state-of-the-art performance on all these datasets with significantly enhanced performances compared with Spikingformer. For instance, our model achieves 77.64 $\%$ on ImageNet, 96.04 $\%$ on CIFAR10, 81.4$\%$ on CIFAR10-DVS, with + 1.79$\%$ on ImageNet, +1.16$\%$ on CIFAR100 compared with Spikingformer.

* 12 pages

Via

Access Paper or Ask Questions

Parallel Spiking Neurons with High Efficiency and Long-term Dependencies Learning Ability

Apr 25, 2023

Wei Fang, Zhaofei Yu, Zhaokun Zhou, Yanqi Chen, Zhengyu Ma, Timothée Masquelier, Yonghong Tian

Figure 1 for Parallel Spiking Neurons with High Efficiency and Long-term Dependencies Learning Ability

Figure 2 for Parallel Spiking Neurons with High Efficiency and Long-term Dependencies Learning Ability

Figure 3 for Parallel Spiking Neurons with High Efficiency and Long-term Dependencies Learning Ability

Figure 4 for Parallel Spiking Neurons with High Efficiency and Long-term Dependencies Learning Ability

Abstract:Vanilla spiking neurons in Spiking Neural Networks (SNNs) use charge-fire-reset neuronal dynamics, which can only be simulated in serial and can hardly learn long-time dependencies. We find that when removing reset, the neuronal dynamics are reformulated in a non-iterative form and can be parallelized. By rewriting neuronal dynamics without resetting to a general formulation, we propose the Parallel Spiking Neuron (PSN), which uses dense connections between time-steps to maximize the utilization of temporal information. To avoid the use of future inputs for low-latency inference, we add masks on the weights and obtain the masked PSN. By sharing weights across time-steps, the sliding PSN is proposed with the ability to deal with sequences with variant lengths. We evaluate the PSN family on simulation speed and temporal/static data classification, and the results show the overwhelming advantage of the PSN family in efficiency and accuracy. To our best knowledge, this is the first research about parallelizing spiking neurons and can be a cornerstone for the spiking deep learning community. Our codes are available at \url{https://github.com/fangwei123456/Parallel-Spiking-Neuron}.

Via

Access Paper or Ask Questions