The feed-forward networks (FFNs) in transformers are recognized as a group of key-value neural memories to restore abstract high-level knowledge. In this work, we conduct an empirical ablation study on updating keys (the 1st layer in the FFNs layer) or values (the 2nd layer in the FFNs layer). We compare those two methods in various knowledge editing and fine-tuning tasks of large language models to draw insights to understand FFNs further. Code is available at $\href{https://github.com/qiuzh20/Tuning-keys-v.s.-values}{this\,repo}$.
Knowledge-editing updates knowledge of large language models (LLMs) and contributes to the interpretability and application of LLMs. However, knowledge applying is context-consistent: LLMs can recall the same knowledge in different contexts. Existing works ignore this property and the editing lacks generalization. In this paper, we empirically find that the effects of different contexts upon LLMs in recalling the same knowledge follow a Gaussian-like distribution. We then sample Gaussian noises to simulate the effects of different contexts when updating LLMs. By such, we can make LLMs see the unseen contexts where the edited knowledge will be applied, therefore improving the editing generalization. Experimental results on three LLMs demonstrate the effectiveness of our methods and also distinguish our methods from the others of fine-tuning LLMs by noises.
Incorporating large-scale pre-trained models with the prototypical neural networks is a de-facto paradigm in few-shot named entity recognition. Existing methods, unfortunately, are not aware of the fact that embeddings from pre-trained models contain a prominently large amount of information regarding word frequencies, biasing prototypical neural networks against learning word entities. This discrepancy constrains the two models' synergy. Thus, we propose a one-line-code normalization method to reconcile such a mismatch with empirical and theoretical grounds. Our experiments based on nine benchmark datasets show the superiority of our method over the counterpart models and are comparable to the state-of-the-art methods. In addition to the model enhancement, our work also provides an analytical viewpoint for addressing the general problems in few-shot name entity recognition or other tasks that rely on pre-trained models or prototypical neural networks.
Hybrid data combining both tabular and textual content (e.g., financial reports) are quite pervasive in the real world. However, Question Answering (QA) over such hybrid data is largely neglected in existing research. In this work, we extract samples from real financial reports to build a new large-scale QA dataset containing both Tabular And Textual data, named TAT-QA, where numerical reasoning is usually required to infer the answer, such as addition, subtraction, multiplication, division, counting, comparison/sorting, and the compositions. We further propose a novel QA model termed TAGOP, which is capable of reasoning over both tables and text. It adopts sequence tagging to extract relevant cells from the table along with relevant spans from the text to infer their semantics, and then applies symbolic reasoning over them with a set of aggregation operators to arrive at the final answer. TAGOPachieves 58.0% inF1, which is an 11.1% absolute increase over the previous best baseline model, according to our experiments on TAT-QA. But this result still lags far behind performance of expert human, i.e.90.8% in F1. It is demonstrated that our TAT-QA is very challenging and can serve as a benchmark for training and testing powerful QA models that address hybrid form data.
From childhood to youth, human gradually come to know the world. But for neural networks, this growing process seems difficult. Trapped in catastrophic forgetting, current researchers feed data of all categories to a neural network which remains the same structure in the whole training process. We compare this training process with human learing patterns, and find two major conflicts. In this paper, we study how to solve these conflicts on generative models based on the conditional variational autoencoder(CVAE) model. To solve the uncontinuous conflict, we apply memory playback strategy to maintain the model's recognizing and generating ability on invisible used categories. And we extend the traditional one-way CVAE to a circulatory mode to better accomplish memory playback strategy. To solve the `dead' structure conflict, we rewrite the CVAE formula then are able to make a novel interpretation about the funtions of different parts in CVAE models. Based on the new understanding, we find ways to dynamically extend the network structure when training on new categories. We verify the effectiveness of our methods on MNIST and Fashion MNIST and display some very insteresting results.