Neural operators extend the capabilities of traditional neural networks by allowing them to handle mappings between function spaces for the purpose of solving partial differential equations (PDEs). One of the most notable methods is the Fourier Neural Operator (FNO), which is inspired by Green's function method and approximate operator kernel directly in the frequency domain. In this work, we focus on predicting multiscale dynamic spaces, which is equivalent to solving multiscale PDEs. Multiscale PDEs are characterized by rapid coefficient changes and solution space oscillations, which are crucial for modeling atmospheric convection and ocean circulation. To solve this problem, models should have the ability to capture rapid changes and process them at various scales. However, the FNO only approximates kernels in the low-frequency domain, which is insufficient when solving multiscale PDEs. To address this challenge, we propose a novel hierarchical neural operator that integrates improved Fourier layers with attention mechanisms, aiming to capture all details and handle them at various scales. These mechanisms complement each other in the frequency domain and encourage the model to solve multiscale problems. We perform experiments on dynamic spaces governed by forward and reverse problems of multiscale elliptic equations, Navier-Stokes equations and some other physical scenarios, and reach superior performance in existing PDE benchmarks, especially equations characterized by rapid coefficient variations.
Neural ordinary differential equations (ODEs) are widely recognized as the standard for modeling physical mechanisms, which help to perform approximate inference in unknown physical or biological environments. In partially observable (PO) environments, how to infer unseen information from raw observations puzzled the agents. By using a recurrent policy with a compact context, context-based reinforcement learning provides a flexible way to extract unobservable information from historical transitions. To help the agent extract more dynamics-related information, we present a novel ODE-based recurrent model combines with model-free reinforcement learning (RL) framework to solve partially observable Markov decision processes (POMDPs). We experimentally demonstrate the efficacy of our methods across various PO continuous control and meta-RL tasks. Furthermore, our experiments illustrate that our method is robust against irregular observations, owing to the ability of ODEs to model irregularly-sampled time series.
By integrating the self-attention capability and the biological properties of Spiking Neural Networks (SNNs), Spikformer applies the flourishing Transformer architecture to SNNs design. It introduces a Spiking Self-Attention (SSA) module to mix sparse visual features using spike-form Query, Key, and Value, resulting in the State-Of-The-Art (SOTA) performance on numerous datasets compared to previous SNN-like frameworks. In this paper, we demonstrate that the Spikformer architecture can be accelerated by replacing the SSA with an unparameterized Linear Transform (LT) such as Fourier and Wavelet transforms. These transforms are utilized to mix spike sequences, reducing the quadratic time complexity to log-linear time complexity. They alternate between the frequency and time domains to extract sparse visual features, showcasing powerful performance and efficiency. We conduct extensive experiments on image classification using both neuromorphic and static datasets. The results indicate that compared to the SOTA Spikformer with SSA, Spikformer with LT achieves higher Top-1 accuracy on neuromorphic datasets (i.e., CIFAR10-DVS and DVS128 Gesture) and comparable Top-1 accuracy on static datasets (i.e., CIFAR-10 and CIFAR-100). Furthermore, Spikformer with LT achieves approximately 29-51% improvement in training speed, 61-70% improvement in inference speed, and reduces memory usage by 4-26% due to not requiring learnable parameters.
Spiking Neural Networks (SNNs) have garnered widespread interest for their energy efficiency and brain-inspired event-driven properties. While recent methods like Spiking-YOLO have expanded the SNNs to more challenging object detection tasks, they often suffer from high latency and low detection accuracy, making them difficult to deploy on latency sensitive mobile platforms. Furthermore, the conversion method from Artificial Neural Networks (ANNs) to SNNs is hard to maintain the complete structure of the ANNs, resulting in poor feature representation and high conversion errors. To address these challenges, we propose two methods: timesteps compression and spike-time-dependent integrated (STDI) coding. The former reduces the timesteps required in ANN-SNN conversion by compressing information, while the latter sets a time-varying threshold to expand the information holding capacity. We also present a SNN-based ultra-low latency and high accurate object detection model (SUHD) that achieves state-of-the-art performance on nontrivial datasets like PASCAL VOC and MS COCO, with about remarkable 750x fewer timesteps and 30% mean average precision (mAP) improvement, compared to the Spiking-YOLO on MS COCO datasets. To the best of our knowledge, SUHD is the deepest spike-based object detection model to date that achieves ultra low timesteps to complete the lossless conversion.
Adaptive human-agent and agent-agent cooperation are becoming more and more critical in the research area of multi-agent reinforcement learning (MARL), where remarked progress has been made with the help of deep neural networks. However, many established algorithms can only perform well during the learning paradigm but exhibit poor generalization during cooperation with other unseen partners. The personality theory in cognitive psychology describes that humans can well handle the above cooperation challenge by predicting others' personalities first and then their complex actions. Inspired by this two-step psychology theory, we propose a biologically plausible mixture of personality (MoP) improved spiking actor network (SAN), whereby a determinantal point process is used to simulate the complex formation and integration of different types of personality in MoP, and dynamic and spiking neurons are incorporated into the SAN for the efficient reinforcement learning. The benchmark Overcooked task, containing a strong requirement for cooperative cooking, is selected to test the proposed MoP-SAN. The experimental results show that the MoP-SAN can achieve both high performances during not only the learning paradigm but also the generalization test (i.e., cooperation with other unseen agents) paradigm where most counterpart deep actor networks failed. Necessary ablation experiments and visualization analyses were conducted to explain why MoP and SAN are effective in multi-agent reinforcement learning scenarios while DNN performs poorly in the generalization test.
The spiking neural network (SNN) using leaky-integrated-and-fire (LIF) neurons has been commonly used in automatic speech recognition (ASR) tasks. However, the LIF neuron is still relatively simple compared to that in the biological brain. Further research on more types of neurons with different scales of neuronal dynamics is necessary. Here we introduce four types of neuronal dynamics to post-process the sequential patterns generated from the spiking transformer to get the complex dynamic neuron improved spiking transformer neural network (DyTr-SNN). We found that the DyTr-SNN could handle the non-toy automatic speech recognition task well, representing a lower phoneme error rate, lower computational cost, and higher robustness. These results indicate that the further cooperation of SNNs and neural dynamics at the neuron and network scales might have much in store for the future, especially on the ASR tasks.
* 8 pages. Spiking Neural Networks, ASR, Speech and Language
Processing. The first three authors contributed equally
Network architectures and learning principles are playing key in forming complex functions in artificial neural networks (ANNs) and spiking neural networks (SNNs). SNNs are considered the new-generation artificial networks by incorporating more biological features than ANNs, including dynamic spiking neurons, functionally specified architectures, and efficient learning paradigms. Network architectures are also considered embodying the function of the network. Here, we propose a Motif-topology improved SNN (M-SNN) for the efficient multi-sensory integration and cognitive phenomenon simulations. The cognitive phenomenon simulation we simulated includes the cocktail party effect and McGurk effect, which are discussed by many researchers. Our M-SNN constituted by the meta operator called network motifs. The source of 3-node network motifs topology from artificial one pre-learned from the spatial or temporal dataset. In the single-sensory classification task, the results showed the accuracy of M-SNN using network motif topologies was higher than the pure feedforward network topology without using them. In the multi-sensory integration task, the performance of M-SNN using artificial network motif was better than the state-of-the-art SNN using BRP (biologically-plausible reward propagation). Furthermore, the M-SNN could better simulate the cocktail party effect and McGurk effect with lower computational cost. We think the artificial network motifs could be considered as some prior knowledge that would contribute to the multi-sensory integration of SNNs and provide more benefits for simulating the cognitive phenomenon.
Network architectures and learning principles are key in forming complex functions in artificial neural networks (ANNs) and spiking neural networks (SNNs). SNNs are considered the new-generation artificial networks by incorporating more biological features than ANNs, including dynamic spiking neurons, functionally specified architectures, and efficient learning paradigms. In this paper, we propose a Motif-topology and Reward-learning improved SNN (MR-SNN) for efficient multi-sensory integration. MR-SNN contains 13 types of 3-node Motif topologies which are first extracted from independent single-sensory learning paradigms and then integrated for multi-sensory classification. The experimental results showed higher accuracy and stronger robustness of the proposed MR-SNN than other conventional SNNs without using Motifs. Furthermore, the proposed reward learning paradigm was biologically plausible and can better explain the cognitive McGurk effect caused by incongruent visual and auditory sensory signals.