What is Generative AI? Generative AI or generative artificial intelligence refers to a type of AI that can create various types of content including text, audio, music, images, videos, and code. This is powered by large models called foundation models that are trained on massive datasets to perform out-of-the-box tasks including classification, summarization, video and audio comprehension, prediction, Q&A, and more.
Papers and Code
Oct 31, 2024
Abstract:Artificial Intelligence-Generated Content (AIGC) has the potential to transform how people build and interact with virtual environments. Within this paper, we discuss potential benefits but also challenges that AIGC has for the creation of inclusive and accessible virtual environments. Specifically, we touch upon the decreased need for 3D modeling expertise, benefits of symbolic-only as well as multimodal input, 3D content editing, and 3D model accessibility as well as foundation model-specific challenges.
* Presented at the CHI 2024 Workshop "Building a Metaverse for All:
Opportunities and Challenges for Future Inclusive and Accessible Virtual
Environments", May 11, 2024, Honolulu, Hawaii
Via
Oct 31, 2024
Abstract:Federated learning enables edge devices to collaboratively train a global model while maintaining data privacy by keeping data localized. However, the Non-IID nature of data distribution across devices often hinders model convergence and reduces performance. In this paper, we propose a novel plugin for federated optimization techniques that approximates Non-IID data distributions to IID through generative AI-enhanced data augmentation and balanced sampling strategy. Key idea is to synthesize additional data for underrepresented classes on each edge device, leveraging generative AI to create a more balanced dataset across the FL network. Additionally, a balanced sampling approach at the central server selectively includes only the most IID-like devices, accelerating convergence while maximizing the global model's performance. Experimental results validate that our approach significantly improves convergence speed and robustness against data imbalance, establishing a flexible, privacy-preserving FL plugin that is applicable even in data-scarce environments.
* 8 pages
Via
Oct 31, 2024
Abstract:Image labeling is a critical bottleneck in the development of computer vision technologies, often constraining the potential of machine learning models due to the time-intensive nature of manual annotations. This work introduces a novel approach that leverages outpainting to address the problem of annotated data scarcity by generating artificial contexts and annotations, significantly reducing manual labeling efforts. We apply this technique to a particularly acute challenge in autonomous driving, urban planning, and environmental monitoring: the lack of diverse, eye-level vehicle images in desired classes. Our dataset comprises AI-generated vehicle images obtained by detecting and cropping vehicles from manually selected seed images, which are then outpainted onto larger canvases to simulate varied real-world conditions. The outpainted images include detailed annotations, providing high-quality ground truth data. Advanced outpainting techniques and image quality assessments ensure visual fidelity and contextual relevance. Augmentation with outpainted vehicles improves overall performance metrics by up to 8\% and enhances prediction of underrepresented classes by up to 20\%. This approach, exemplifying outpainting as a self-annotating paradigm, presents a solution that enhances dataset versatility across multiple domains of machine learning. The code and links to datasets used in this study are available for further research and replication at https://github.com/amir-kazemi/aidovecl.
* 19 pages, 4 figures, 3 tables
Via
Oct 31, 2024
Abstract:Climate decision making is constrained by the complexity and inaccessibility of key information within lengthy, technical, and multi-lingual documents. Generative AI technologies offer a promising route for improving the accessibility of information contained within these documents, but suffer from limitations. These include (1) a tendency to hallucinate or mis-represent information, (2) difficulty in steering or guaranteeing properties of generated output, and (3) reduced performance in specific technical domains. To address these challenges, we introduce a novel evaluation framework with domain-specific dimensions tailored for climate-related documents. We then apply this framework to evaluate Retrieval-Augmented Generation (RAG) approaches and assess retrieval- and generation-quality within a prototype tool that answers questions about individual climate law and policy documents. In addition, we publish a human-annotated dataset and scalable automated evaluation tools, with the aim of facilitating broader adoption and robust assessment of these systems in the climate domain. Our findings highlight the key components of responsible deployment of RAG to enhance decision-making, while also providing insights into user experience (UX) considerations for safely deploying such systems to build trust with users in high-risk domains.
Via
Oct 30, 2024
Abstract:This work takes a pedagogical lens to explore the implications of generative AI (GenAI) models and tools, such as ChatGPT and GitHub Copilot, in a semester-long 2nd-year undergraduate Software Engineering Team Project. Qualitative findings from survey (39 students) and interviews (eight students) provide insights into the students' views on the impact of GenAI use on their coding experience, learning, and self-efficacy. Our results address a particular gap in understanding the role and implications of GenAI on teamwork, team-efficacy, and team dynamics. The analysis of the learning aspects is distinguished by the application of learning and pedagogy informed lenses to discuss the data. We propose a preliminary design space for GenAI-based programming learning tools highlighting the importance of considering the roles that GenAI can play during the learning process, the varying support-ability patterns that can be applied to each role, and the importance of supporting transparency in GenAI for team members and students in addition to educators.
Via
Oct 30, 2024
Abstract:Procedurally generated levels created by machine learning models can be unsolvable without further editing. Various methods have been developed to automatically repair these levels by enforcing hard constraints during the post-processing step. However, as levels increase in size, these constraint-based repairs become increasingly slow. This paper proposes using explainability methods to identify specific regions of a level that contribute to its unsolvability. By assigning higher weights to these regions, constraint-based solvers can prioritize these problematic areas, enabling more efficient repairs. Our results, tested across three games, demonstrate that this approach can help to repair procedurally generated levels faster.
Via
Oct 31, 2024
Abstract:Token generation speed is critical to power the next wave of AI inference applications. GPUs significantly underperform during token generation due to synchronization overheads at kernel boundaries, utilizing only 21% of their peak memory bandwidth. While recent dataflow architectures mitigate these overheads by enabling aggressive fusion of decoder layers into a single kernel, they too leave performance on the table due to synchronization penalties at layer boundaries. This paper presents kernel looping, a specialized global optimization technique which exploits an optimization opportunity brought by combining the unique layer-level fusion possible in modern dataflow architectures with the repeated layer structure found in language models. Kernel looping eliminates synchronization costs between consecutive calls to the same kernel by transforming these calls into a single call to a modified kernel containing a pipelined outer loop. We evaluate kernel looping on the SambaNova SN40L Reconfigurable Dataflow Unit (RDU), a commercial dataflow accelerator for AI. Experiments demonstrate that kernel looping speeds up the decode phase of a wide array of powerful open-source models by up to 2.2$\times$ on SN40L. Kernel looping allows scaling of decode performance over multiple SN40L sockets, achieving speedups of up to 2.5$\times$. Finally, kernel looping enables SN40L to achieve over 90% of peak performance on 8 and 16 sockets and achieve a speedup of up to 3.7$\times$ over DGX H100. Kernel looping, as well as the models evaluated in this paper, are deployed in production in a commercial AI inference cloud.
Via
Oct 31, 2024
Abstract:Embodied agents designed to assist users with tasks must engage in natural language interactions, interpret instructions, execute actions, and communicate effectively to resolve issues. However, collecting large-scale, diverse datasets of situated human-robot dialogues to train and evaluate such agents is expensive, labor-intensive, and time-consuming. To address this challenge, we propose building a large language model (LLM)-based user agent that can simulate user behavior during interactions with an embodied agent in a virtual environment. Given a user goal (e.g., make breakfast), at each time step, the user agent may observe" the robot actions or speak" to either intervene with the robot or answer questions. Such a user agent assists in improving the scalability and efficiency of embodied dialogues dataset generation and is critical for enhancing and evaluating the robot's interaction and task completion ability, as well as for research in reinforcement learning using AI feedback. We evaluate our user agent's ability to generate human-like behaviors by comparing its simulated dialogues with the TEACh dataset. We perform three experiments: zero-shot prompting to predict dialogue acts, few-shot prompting, and fine-tuning on the TEACh training subset. Results show the LLM-based user agent achieves an F-measure of 42% with zero-shot prompting and 43.4% with few-shot prompting in mimicking human speaking behavior. Through fine-tuning, performance in deciding when to speak remained stable, while deciding what to say improved from 51.1% to 62.5%. These findings showcase the feasibility of the proposed approach for assessing and enhancing the effectiveness of robot task completion through natural language communication.
* NeurIPS 2024 Workshop on Open-World Agents
* 8 pages, 5 figures, 4 tables
Via
Oct 30, 2024
Abstract:As AI agents generate increasingly sophisticated behaviors, manually encoding human preferences to guide these agents becomes more challenging. To address this, it has been suggested that agents instead learn preferences from human choice data. This approach requires a model of choice behavior that the agent can use to interpret the data. For choices between partial trajectories of states and actions, previous models assume choice probabilities to be determined by the partial return or the cumulative advantage. We consider an alternative model based instead on the bootstrapped return, which adds to the partial return an estimate of the future return. Benefits of the bootstrapped return model stem from its treatment of human beliefs. Unlike partial return, choices based on bootstrapped return reflect human beliefs about the environment. Further, while recovering the reward function from choices based on cumulative advantage requires that those beliefs are correct, doing so from choices based on bootstrapped return does not. To motivate the bootstrapped return model, we formulate axioms and prove an Alignment Theorem. This result formalizes how, for a general class of human preferences, such models are able to disentangle goals from beliefs. This ensures recovery of an aligned reward function when learning from choices based on bootstrapped return. The bootstrapped return model also affords greater robustness to choice behavior. Even when choices are based on partial return, learning via a bootstrapped return model recovers an aligned reward function. The same holds with choices based on the cumulative advantage if the human and the agent both adhere to correct and consistent beliefs about the environment. On the other hand, if choices are based on bootstrapped return, learning via partial return or cumulative advantage models does not generally produce an aligned reward function.
Via
Oct 28, 2024
Abstract:The applications of generative AI have become extremely impressive, and the interplay between users and AI is even more so. Current human-AI interaction literature has taken a broad look at how humans interact with generative AI, but it lacks specificity regarding the user interface designs and patterns used to create these applications. Therefore, we present a survey that comprehensively presents taxonomies of how a human interacts with AI and the user interaction patterns designed to meet the needs of a variety of relevant use cases. We focus primarily on user-guided interactions, surveying interactions that are initiated by the user and do not include any implicit signals given by the user. With this survey, we aim to create a compendium of different user-interaction patterns that can be used as a reference for designers and developers alike. In doing so, we also strive to lower the entry barrier for those attempting to learn more about the design of generative AI applications.
Via