Abstract:Recent work has demonstrated that transformers and linear attention models can perform in-context learning (ICL) on simple function classes, such as linear regression. In this paper, we empirically study how these two attention mechanisms differ in their ICL behavior on the canonical linear-regression task of Garg et al. We evaluate learning quality (MSE), convergence, and generalization behavior of each architecture. We also analyze how increasing model depth affects ICL performance. Our results illustrate both the similarities and limitations of linear attention relative to quadratic attention in this setting.




Abstract:From face recognition systems installed in phones to self-driving cars, the field of AI is witnessing rapid transformations and is being integrated into our everyday lives at an incredible pace. Any major failure in these system's predictions could be devastating, leaking sensitive information or even costing lives (as in the case of self-driving cars). However, deep neural networks, which form the basis of such systems, are highly susceptible to a specific type of attack, called adversarial attacks. A hacker can, even with bare minimum computation, generate adversarial examples (images or data points that belong to another class, but consistently fool the model to get misclassified as genuine) and crumble the basis of such algorithms. In this paper, we compile and test numerous approaches to defend against such adversarial attacks. Out of the ones explored, we found two effective techniques, namely Dropout and Denoising Autoencoders, and show their success in preventing such attacks from fooling the model. We demonstrate that these techniques are also resistant to both higher noise levels as well as different kinds of adversarial attacks (although not tested against all). We also develop a framework for deciding the suitable defense technique to use against attacks, based on the nature of the application and resource constraints of the Deep Neural Network.