Abstract:Softmax-based dot-product attention is a cornerstone of Transformer architectures, enabling remarkable capabilities such as in-context learning. However, as context lengths increase, a fundamental limitation of the softmax function emerges: it tends to diffuse probability mass to irrelevant tokens degrading performance in long-sequence scenarios. Furthermore, attempts to sharpen focus by lowering softmax temperature hinder learnability due to vanishing gradients. We introduce LUCID Attention, an architectural modification that applies a preconditioner to the attention probabilities. This preconditioner, derived from exponentiated key-key similarities, minimizes overlap between the keys in a Reproducing Kernel Hilbert Space, thus allowing the query to focus on important keys among large number of keys accurately with same computational complexity as standard attention. Additionally, LUCID's preconditioning-based approach to retrieval bypasses the need for low temperature and the learnability problems associated with it. We validate our approach by training ~1 billion parameter language models evaluated on up to 128K tokens. Our results demonstrate significant gains on long-context retrieval tasks, specifically retrieval tasks from BABILong, RULER, SCROLLS and LongBench. For instance, LUCID achieves up to 18% improvement in BABILong and 14% improvement in RULER multi-needle performance compared to standard attention.
Abstract:Generative AI systems are increasingly capable of expressing emotions via text and imagery. Effective emotional expression will likely play a major role in the efficacy of AI systems -- particularly those designed to support human mental health and wellbeing. This motivates our present research to better understand the alignment of AI expressed emotions with the human perception of emotions. When AI tries to express a particular emotion, how might we assess whether they are successful? To answer this question, we designed a survey to measure the alignment between emotions expressed by generative AI and human perceptions. Three generative image models (DALL-E 2, DALL-E 3 and Stable Diffusion v1) were used to generate 240 examples of images, each of which was based on a prompt designed to express five positive and five negative emotions across both humans and robots. 24 participants recruited from the Prolific website rated the alignment of AI-generated emotional expressions with a text prompt used to generate the emotion (i.e., "A robot expressing the emotion amusement"). The results of our evaluation suggest that generative AI models are indeed capable of producing emotional expressions that are well-aligned with a range of human emotions; however, we show that the alignment significantly depends upon the AI model used and the emotion itself. We analyze variations in the performance of these systems to identify gaps for future improvement. We conclude with a discussion of the implications for future AI systems designed to support mental health and wellbeing.

Abstract:In this paper, we describe our Knowledge Tracing model for the 2020 NeurIPS Education Challenge. We used a combination of 22 models to predict whether the students will answer a given question correctly or not. Our combination of different approaches allowed us to get an accuracy higher than any of the individual models, and the variation of our model types gave our solution better explainability, more alignment with learning science theories, and high predictive power.