Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yonghong Tian

Pengcheng Laboratory, Peking University

Direct Training High-Performance Deep Spiking Neural Networks: A Review of Theories and Methods

May 06, 2024

Chenlin Zhou, Han Zhang, Liutao Yu, Yumin Ye, Zhaokun Zhou, Liwei Huang, Zhengyu Ma, Xiaopeng Fan, Huihui Zhou, Yonghong Tian

Figure 1 for Direct Training High-Performance Deep Spiking Neural Networks: A Review of Theories and Methods

Figure 2 for Direct Training High-Performance Deep Spiking Neural Networks: A Review of Theories and Methods

Figure 3 for Direct Training High-Performance Deep Spiking Neural Networks: A Review of Theories and Methods

Figure 4 for Direct Training High-Performance Deep Spiking Neural Networks: A Review of Theories and Methods

Abstract:Spiking neural networks (SNNs) offer a promising energy-efficient alternative to artificial neural networks (ANNs), in virtue of their high biological plausibility, rich spatial-temporal dynamics, and event-driven computation. The direct training algorithms based on the surrogate gradient method provide sufficient flexibility to design novel SNN architectures and explore the spatial-temporal dynamics of SNNs. According to previous studies, the performance of models is highly dependent on their sizes. Recently, direct training deep SNNs have achieved great progress on both neuromorphic datasets and large-scale static datasets. Notably, transformer-based SNNs show comparable performance with their ANN counterparts. In this paper, we provide a new perspective to summarize the theories and methods for training deep SNNs with high performance in a systematic and comprehensive way, including theory fundamentals, spiking neuron models, advanced SNN models and residual architectures, software frameworks and neuromorphic hardware, applications, and future trends. The reviewed papers are collected at https://github.com/zhouchenlin2096/Awesome-Spiking-Neural-Networks

* 29 pages

Via

Access Paper or Ask Questions

Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition

Apr 27, 2024

Xiao Wang, Qian Zhu, Jiandong Jin, Jun Zhu, Futian Wang, Bo Jiang, Yaowei Wang, Yonghong Tian

Figure 1 for Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition

Figure 2 for Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition

Figure 3 for Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition

Figure 4 for Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition

Abstract:Existing pedestrian attribute recognition (PAR) algorithms are mainly developed based on a static image, however, the performance is unreliable in challenging scenarios, such as heavy occlusion, motion blur, etc. In this work, we propose to understand human attributes using video frames that can fully use temporal information by fine-tuning a pre-trained multi-modal foundation model efficiently. Specifically, we formulate the video-based PAR as a vision-language fusion problem and adopt a pre-trained foundation model CLIP to extract the visual features. More importantly, we propose a novel spatiotemporal side-tuning strategy to achieve parameter-efficient optimization of the pre-trained vision foundation model. To better utilize the semantic information, we take the full attribute list that needs to be recognized as another input and transform the attribute words/phrases into the corresponding sentence via split, expand, and prompt operations. Then, the text encoder of CLIP is utilized for embedding processed attribute descriptions. The averaged visual tokens and text tokens are concatenated and fed into a fusion Transformer for multi-modal interactive learning. The enhanced tokens will be fed into a classification head for pedestrian attribute prediction. Extensive experiments on two large-scale video-based PAR datasets fully validated the effectiveness of our proposed framework. The source code of this paper is available at https://github.com/Event-AHU/OpenPAR.

* Parameter Efficient Fine-Tuning Strategy for Video-based Pedestrian Attribute Recognition

Via

Access Paper or Ask Questions

State Space Model for New-Generation Network Alternative to Transformers: A Survey

Apr 15, 2024

Xiao Wang, Shiao Wang, Yuhe Ding, Yuehang Li, Wentao Wu, Yao Rong, Weizhe Kong, Ju Huang, Shihao Li, Haoxiang Yang(+6 more)

Abstract:In the post-deep learning era, the Transformer architecture has demonstrated its powerful performance across pre-trained big models and various downstream tasks. However, the enormous computational demands of this architecture have deterred many researchers. To further reduce the complexity of attention models, numerous efforts have been made to design more efficient methods. Among them, the State Space Model (SSM), as a possible replacement for the self-attention based Transformer model, has drawn more and more attention in recent years. In this paper, we give the first comprehensive review of these works and also provide experimental comparisons and analysis to better demonstrate the features and advantages of SSM. Specifically, we first give a detailed description of principles to help the readers quickly capture the key ideas of SSM. After that, we dive into the reviews of existing SSMs and their various applications, including natural language processing, computer vision, graph, multi-modal and multi-media, point cloud/event stream, time series data, and other domains. In addition, we give statistical comparisons and analysis of these models and hope it helps the readers to understand the effectiveness of different structures on various tasks. Then, we propose possible research points in this direction to better promote the development of the theoretical model and application of SSM. More related works will be continuously updated on the following GitHub: https://github.com/Event-AHU/Mamba_State_Space_Model_Paper_List.

* The First review of State Space Model (SSM)/Mamba and their applications in artificial intelligence, 33 pages

Via

Access Paper or Ask Questions

MTLight: Efficient Multi-Task Reinforcement Learning for Traffic Signal Control

Apr 01, 2024

Liwen Zhu, Peixi Peng, Zongqing Lu, Yonghong Tian

Abstract:Traffic signal control has a great impact on alleviating traffic congestion in modern cities. Deep reinforcement learning (RL) has been widely used for this task in recent years, demonstrating promising performance but also facing many challenges such as limited performances and sample inefficiency. To handle these challenges, MTLight is proposed to enhance the agent observation with a latent state, which is learned from numerous traffic indicators. Meanwhile, multiple auxiliary and supervisory tasks are constructed to learn the latent state, and two types of embedding latent features, the task-specific feature and task-shared feature, are used to make the latent state more abundant. Extensive experiments conducted on CityFlow demonstrate that MTLight has leading convergence speed and asymptotic performance. We further simulate under peak-hour pattern in all scenarios with increasing control difficulty and the results indicate that MTLight is highly adaptable.

Via

Access Paper or Ask Questions

QKFormer: Hierarchical Spiking Transformer using Q-K Attention

Mar 25, 2024

Chenlin Zhou, Han Zhang, Zhaokun Zhou, Liutao Yu, Liwei Huang, Xiaopeng Fan, Li Yuan, Zhengyu Ma, Huihui Zhou, Yonghong Tian

Figure 1 for QKFormer: Hierarchical Spiking Transformer using Q-K Attention

Figure 2 for QKFormer: Hierarchical Spiking Transformer using Q-K Attention

Figure 3 for QKFormer: Hierarchical Spiking Transformer using Q-K Attention

Figure 4 for QKFormer: Hierarchical Spiking Transformer using Q-K Attention

Abstract:Spiking Transformers, which integrate Spiking Neural Networks (SNNs) with Transformer architectures, have attracted significant attention due to their potential for energy efficiency and high performance. However, existing models in this domain still suffer from suboptimal performance. We introduce several innovations to improve the performance: i) We propose a novel spike-form Q-K attention mechanism, tailored for SNNs, which efficiently models the importance of token or channel dimensions through binary vectors with linear complexity. ii) We incorporate the hierarchical structure, which significantly benefits the performance of both the brain and artificial neural networks, into spiking transformers to obtain multi-scale spiking representation. iii) We design a versatile and powerful patch embedding module with a deformed shortcut specifically for spiking transformers. Together, we develop QKFormer, a hierarchical spiking transformer based on Q-K attention with direct training. QKFormer shows significantly superior performance over existing state-of-the-art SNN models on various mainstream datasets. Notably, with comparable size to Spikformer (66.34 M, 74.81%), QKFormer (64.96 M) achieves a groundbreaking top-1 accuracy of 85.65% on ImageNet-1k, substantially outperforming Spikformer by 10.84%. To our best knowledge, this is the first time that directly training SNNs have exceeded 85% accuracy on ImageNet-1K. The code and models are publicly available at https://github.com/zhouchenlin2096/QKFormer

* 10 pages, code: https://github.com/zhouchenlin2096/QKFormer

Via

Access Paper or Ask Questions

Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline

Mar 09, 2024

Xiao Wang, Ju Huang, Shiao Wang, Chuanming Tang, Bo Jiang, Yonghong Tian, Jin Tang, Bin Luo

Figure 1 for Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline

Figure 2 for Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline

Figure 3 for Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline

Figure 4 for Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline

Abstract:Current event-/frame-event based trackers undergo evaluation on short-term tracking datasets, however, the tracking of real-world scenarios involves long-term tracking, and the performance of existing tracking algorithms in these scenarios remains unclear. In this paper, we first propose a new long-term and large-scale frame-event single object tracking dataset, termed FELT. It contains 742 videos and 1,594,474 RGB frames and event stream pairs and has become the largest frame-event tracking dataset to date. We re-train and evaluate 15 baseline trackers on our dataset for future works to compare. More importantly, we find that the RGB frames and event streams are naturally incomplete due to the influence of challenging factors and spatially sparse event flow. In response to this, we propose a novel associative memory Transformer network as a unified backbone by introducing modern Hopfield layers into multi-head self-attention blocks to fuse both RGB and event data. Extensive experiments on both FELT and RGB-T tracking dataset LasHeR fully validated the effectiveness of our model. The dataset and source code can be found at \url{https://github.com/Event-AHU/FELT_SOT_Benchmark}.

* In Peer Review

Via

Access Paper or Ask Questions

Noisy Spiking Actor Network for Exploration

Mar 07, 2024

Ding Chen, Peixi Peng, Tiejun Huang, Yonghong Tian

Abstract:As a general method for exploration in deep reinforcement learning (RL), NoisyNet can produce problem-specific exploration strategies. Spiking neural networks (SNNs), due to their binary firing mechanism, have strong robustness to noise, making it difficult to realize efficient exploration with local disturbances. To solve this exploration problem, we propose a noisy spiking actor network (NoisySAN) that introduces time-correlated noise during charging and transmission. Moreover, a noise reduction method is proposed to find a stable policy for the agent. Extensive experimental results demonstrate that our method outperforms the state-of-the-art performance on a wide range of continuous control tasks from OpenAI gym.

* 13 pages, 6 figures

Via

Access Paper or Ask Questions

Adaptive Discovering and Merging for Incremental Novel Class Discovery

Mar 06, 2024

Guangyao Chen, Peixi Peng, Yangru Huang, Mengyue Geng, Yonghong Tian

Abstract:One important desideratum of lifelong learning aims to discover novel classes from unlabelled data in a continuous manner. The central challenge is twofold: discovering and learning novel classes while mitigating the issue of catastrophic forgetting of established knowledge. To this end, we introduce a new paradigm called Adaptive Discovering and Merging (ADM) to discover novel categories adaptively in the incremental stage and integrate novel knowledge into the model without affecting the original knowledge. To discover novel classes adaptively, we decouple representation learning and novel class discovery, and use Triple Comparison (TC) and Probability Regularization (PR) to constrain the probability discrepancy and diversity for adaptive category assignment. To merge the learned novel knowledge adaptively, we propose a hybrid structure with base and novel branches named Adaptive Model Merging (AMM), which reduces the interference of the novel branch on the old classes to preserve the previous knowledge, and merges the novel branch to the base model without performance loss and parameter growth. Extensive experiments on several datasets show that ADM significantly outperforms existing class-incremental Novel Class Discovery (class-iNCD) approaches. Moreover, our AMM also benefits the class-incremental Learning (class-IL) task by alleviating the catastrophic forgetting problem.

* AAAI 2024. arXiv admin note: text overlap with arXiv:2207.08605 by other authors

Via

Access Paper or Ask Questions

Optimal ANN-SNN Conversion with Group Neurons

Feb 29, 2024

Liuzhenghao Lv, Wei Fang, Li Yuan, Yonghong Tian

Figure 1 for Optimal ANN-SNN Conversion with Group Neurons

Figure 2 for Optimal ANN-SNN Conversion with Group Neurons

Figure 3 for Optimal ANN-SNN Conversion with Group Neurons

Figure 4 for Optimal ANN-SNN Conversion with Group Neurons

Abstract:Spiking Neural Networks (SNNs) have emerged as a promising third generation of neural networks, offering unique characteristics such as binary outputs, high sparsity, and biological plausibility. However, the lack of effective learning algorithms remains a challenge for SNNs. For instance, while converting artificial neural networks (ANNs) to SNNs circumvents the need for direct training of SNNs, it encounters issues related to conversion errors and high inference time delays. In order to reduce or even eliminate conversion errors while decreasing inference time-steps, we have introduced a novel type of neuron called Group Neurons (GNs). One GN is composed of multiple Integrate-and-Fire (IF) neurons as members, and its neural dynamics are meticulously designed. Based on GNs, we have optimized the traditional ANN-SNN conversion framework. Specifically, we replace the IF neurons in the SNNs obtained by the traditional conversion framework with GNs. The resulting SNNs, which utilize GNs, are capable of achieving accuracy levels comparable to ANNs even within extremely short inference time-steps. The experiments on CIFAR10, CIFAR100, and ImageNet datasets demonstrate the superiority of the proposed methods in terms of both inference accuracy and latency. Code is available at https://github.com/Lyu6PosHao/ANN2SNN_GN.

* Accepted by International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2024

Via

Access Paper or Ask Questions

Fully Spiking Actor Network with Intra-layer Connections for Reinforcement Learning

Jan 09, 2024

Ding Chen, Peixi Peng, Tiejun Huang, Yonghong Tian

Abstract:With the help of special neuromorphic hardware, spiking neural networks (SNNs) are expected to realize artificial intelligence (AI) with less energy consumption. It provides a promising energy-efficient way for realistic control tasks by combining SNNs with deep reinforcement learning (DRL). In this paper, we focus on the task where the agent needs to learn multi-dimensional deterministic policies to control, which is very common in real scenarios. Recently, the surrogate gradient method has been utilized for training multi-layer SNNs, which allows SNNs to achieve comparable performance with the corresponding deep networks in this task. Most existing spike-based RL methods take the firing rate as the output of SNNs, and convert it to represent continuous action space (i.e., the deterministic policy) through a fully-connected (FC) layer. However, the decimal characteristic of the firing rate brings the floating-point matrix operations to the FC layer, making the whole SNN unable to deploy on the neuromorphic hardware directly. To develop a fully spiking actor network without any floating-point matrix operations, we draw inspiration from the non-spiking interneurons found in insects and employ the membrane voltage of the non-spiking neurons to represent the action. Before the non-spiking neurons, multiple population neurons are introduced to decode different dimensions of actions. Since each population is used to decode a dimension of action, we argue that the neurons in each population should be connected in time domain and space domain. Hence, the intra-layer connections are used in output populations to enhance the representation capacity. Finally, we propose a fully spiking actor network with intra-layer connections (ILC-SAN).

* 13 pages, 6 figures

Via

Access Paper or Ask Questions