Large Language Models (LLMs) have exploded a new heatwave of AI, for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities of the LLMs, categorising them into inherent issues, intended attacks, and unintended bugs. Then, we consider if and how the Verification and Validation (V&V) techniques, which have been widely developed for traditional software and deep learning models such as convolutional neural networks, can be integrated and further extended throughout the lifecycle of the LLMs to provide rigorous analysis to the safety and trustworthiness of LLMs and their applications. Specifically, we consider four complementary techniques: falsification and evaluation, verification, runtime monitoring, and ethical use. Considering the fast development of LLMs, this survey does not intend to be complete (although it includes 300 references), especially when it comes to the applications of LLMs in various domains, but rather a collection of organised literature reviews and discussions to support the quick understanding of the safety and trustworthiness issues from the perspective of V&V.
In recent years, there has been an explosion of research into developing more robust deep neural networks against adversarial examples. Adversarial training appears as one of the most successful methods. To deal with both the robustness against adversarial examples and the accuracy over clean examples, many works develop enhanced adversarial training methods to achieve various trade-offs between them. Leveraging over the studies that smoothed update on weights during training may help find flat minima and improve generalization, we suggest reconciling the robustness-accuracy trade-off from another perspective, i.e., by adding random noise into deterministic weights. The randomized weights enable our design of a novel adversarial training method via Taylor expansion of a small Gaussian noise, and we show that the new adversarial training method can flatten loss landscape and find flat minima. With PGD, CW, and Auto Attacks, an extensive set of experiments demonstrate that our method enhances the state-of-the-art adversarial training methods, boosting both robustness and clean accuracy. The code is available at https://github.com/Alexkael/Randomized-Adversarial-Training.
Spiking neural networks (SNNs), a variant of artificial neural networks (ANNs) with the benefit of energy efficiency, have achieved the accuracy close to its ANN counterparts, on benchmark datasets such as CIFAR10/100 and ImageNet. However, comparing with frame-based input (e.g., images), event-based inputs from e.g., Dynamic Vision Sensor (DVS) can make a better use of SNNs thanks to the SNNs' asynchronous working mechanism. In this paper, we strengthen the marriage between SNNs and event-based inputs with a proposal to consider anytime optimal inference SNNs, or AOI-SNNs, which can terminate anytime during the inference to achieve optimal inference result. Two novel optimisation techniques are presented to achieve AOI-SNNs: a regularisation and a cutoff. The regularisation enables the training and construction of SNNs with optimised performance, and the cutoff technique optimises the inference of SNNs on event-driven inputs. We conduct an extensive set of experiments on multiple benchmark event-based datasets, including CIFAR10-DVS, N-Caltech101 and DVS128 Gesture. The experimental results demonstrate that our techniques are superior to the state-of-the-art with respect to the accuracy and latency.
Spiking neural networks (SNNs) offer an inherent ability to process spatial-temporal data, or in other words, realworld sensory data, but suffer from the difficulty of training high accuracy models. A major thread of research on SNNs is on converting a pre-trained convolutional neural network (CNN) to an SNN of the same structure. State-of-the-art conversion methods are approaching the accuracy limit, i.e., the near-zero accuracy loss of SNN against the original CNN. However, we note that this is made possible only when significantly more energy is consumed to process an input. In this paper, we argue that this trend of "energy for accuracy" is not necessary -- a little energy can go a long way to achieve the near-zero accuracy loss. Specifically, we propose a novel CNN-to-SNN conversion method that is able to use a reasonably short spike train (e.g., 256 timesteps for CIFAR10 images) to achieve the near-zero accuracy loss. The new conversion method, named as explicit current control (ECC), contains three techniques (current normalisation, thresholding for residual elimination, and consistency maintenance for batch-normalisation), in order to explicitly control the currents flowing through the SNN when processing inputs. We implement ECC into a tool nicknamed SpKeras, which can conveniently import Keras CNN models and convert them into SNNs. We conduct an extensive set of experiments with the tool -- working with VGG16 and various datasets such as CIFAR10 and CIFAR100 -- and compare with state-of-the-art conversion methods. Results show that ECC is a promising method that can optimise over energy consumption and accuracy loss simultaneously.