Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Honglak Lee

University of Michigan, Ann Arbor

Curve Your Attention: Mixed-Curvature Transformers for Graph Representation Learning

Sep 08, 2023

Sungjun Cho, Seunghyuk Cho, Sungwoo Park, Hankook Lee, Honglak Lee, Moontae Lee

Abstract:Real-world graphs naturally exhibit hierarchical or cyclical structures that are unfit for the typical Euclidean space. While there exist graph neural networks that leverage hyperbolic or spherical spaces to learn representations that embed such structures more accurately, these methods are confined under the message-passing paradigm, making the models vulnerable against side-effects such as oversmoothing and oversquashing. More recent work have proposed global attention-based graph Transformers that can easily model long-range interactions, but their extensions towards non-Euclidean geometry are yet unexplored. To bridge this gap, we propose Fully Product-Stereographic Transformer, a generalization of Transformers towards operating entirely on the product of constant curvature spaces. When combined with tokenized graph Transformers, our model can learn the curvature appropriate for the input graph in an end-to-end fashion, without the need of additional tuning on different curvature initializations. We also provide a kernelized approach to non-Euclidean attention, which enables our model to run in time and memory cost linear to the number of nodes and edges while respecting the underlying geometry. Experiments on graph reconstruction and node classification demonstrate the benefits of generalizing Transformers to the non-Euclidean domain.

* 19 pages, 7 figures

Via

Access Paper or Ask Questions

Go Beyond Imagination: Maximizing Episodic Reachability with World Models

Aug 25, 2023

Yao Fu, Run Peng, Honglak Lee

Figure 1 for Go Beyond Imagination: Maximizing Episodic Reachability with World Models

Figure 2 for Go Beyond Imagination: Maximizing Episodic Reachability with World Models

Figure 3 for Go Beyond Imagination: Maximizing Episodic Reachability with World Models

Figure 4 for Go Beyond Imagination: Maximizing Episodic Reachability with World Models

Abstract:Efficient exploration is a challenging topic in reinforcement learning, especially for sparse reward tasks. To deal with the reward sparsity, people commonly apply intrinsic rewards to motivate agents to explore the state space efficiently. In this paper, we introduce a new intrinsic reward design called GoBI - Go Beyond Imagination, which combines the traditional lifelong novelty motivation with an episodic intrinsic reward that is designed to maximize the stepwise reachability expansion. More specifically, we apply learned world models to generate predicted future states with random actions. States with more unique predictions that are not in episodic memory are assigned high intrinsic rewards. Our method greatly outperforms previous state-of-the-art methods on 12 of the most challenging Minigrid navigation tasks and improves the sample efficiency on locomotion tasks from DeepMind Control Suite.

* Published in the 40th International Conference on Machine Learning

Via

Access Paper or Ask Questions

Exploring Demonstration Ensembling for In-context Learning

Aug 21, 2023

Muhammad Khalifa, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Lu Wang

Figure 1 for Exploring Demonstration Ensembling for In-context Learning

Figure 2 for Exploring Demonstration Ensembling for In-context Learning

Figure 3 for Exploring Demonstration Ensembling for In-context Learning

Figure 4 for Exploring Demonstration Ensembling for In-context Learning

Abstract:In-context learning (ICL) operates by showing language models (LMs) examples of input-output pairs for a given task, i.e., demonstrations. The standard approach for ICL is to prompt the LM with concatenated demonstrations followed by the test input. This approach suffers from some issues. First, concatenation offers almost no control over the contribution of each demo to the model prediction. This can be sub-optimal when some demonstrations are irrelevant to the test example. Second, due to the input length limit of some transformer models, it might be infeasible to fit many examples into the context, especially when dealing with long-input tasks. In this work, we explore Demonstration Ensembling (DENSE) as an alternative to simple concatenation. DENSE predicts outputs using subsets (i.e., buckets) of the demonstrations and then combines the output probabilities resulting from each subset to produce the final prediction. We study different ensembling methods using GPT-j and experiment on 12 language tasks. Our experiments show weighted max ensembling to outperform vanilla concatenation by as large as 2.4 average points. Code available at https://github.com/mukhal/icl-ensembling.

* Published at ME-FoMo workshop at ICLR 2023. Arxiv version includes evaluation on 5 more tasks

Via

Access Paper or Ask Questions

Story Visualization by Online Text Augmentation with Context Memory

Aug 19, 2023

Daechul Ahn, Daneul Kim, Gwangmo Song, Seung Hwan Kim, Honglak Lee, Dongyeop Kang, Jonghyun Choi

Figure 1 for Story Visualization by Online Text Augmentation with Context Memory

Figure 2 for Story Visualization by Online Text Augmentation with Context Memory

Figure 3 for Story Visualization by Online Text Augmentation with Context Memory

Figure 4 for Story Visualization by Online Text Augmentation with Context Memory

Abstract:Story visualization (SV) is a challenging text-to-image generation task for the difficulty of not only rendering visual details from the text descriptions but also encoding a long-term context across multiple sentences. While prior efforts mostly focus on generating a semantically relevant image for each sentence, encoding a context spread across the given paragraph to generate contextually convincing images (e.g., with a correct character or with a proper background of the scene) remains a challenge. To this end, we propose a novel memory architecture for the Bi-directional Transformer framework with an online text augmentation that generates multiple pseudo-descriptions as supplementary supervision during training for better generalization to the language variation at inference. In extensive experiments on the two popular SV benchmarks, i.e., the Pororo-SV and Flintstones-SV, the proposed method significantly outperforms the state of the arts in various metrics including FID, character F1, frame accuracy, BLEU-2/3, and R-precision with similar or less computational complexity.

* ICCV 2023, Project page: https://dcahn12.github.io/projects/CMOTA/

Via

Access Paper or Ask Questions

Scalable 3D Captioning with Pretrained Models

Jun 16, 2023

Tiange Luo, Chris Rockwell, Honglak Lee, Justin Johnson

Abstract:We introduce Cap3D, an automatic approach for generating descriptive text for 3D objects. This approach utilizes pretrained models from image captioning, image-text alignment, and LLM to consolidate captions from multiple views of a 3D asset, completely side-stepping the time-consuming and costly process of manual annotation. We apply Cap3D to the recently introduced large-scale 3D dataset, Objaverse, resulting in 660k 3D-text pairs. Our evaluation, conducted using 41k human annotations from the same dataset, demonstrates that Cap3D surpasses human-authored descriptions in terms of quality, cost, and speed. Through effective prompt engineering, Cap3D rivals human performance in generating geometric descriptions on 17k collected annotations from the ABO dataset. Finally, we finetune Text-to-3D models on Cap3D and human captions, and show Cap3D outperforms; and benchmark the SOTA including Point-E, Shape-E, and DreamFusion.

* Dataset link: https://huggingface.co/datasets/tiange/Cap3D

Via

Access Paper or Ask Questions

Fine-grained Text Style Transfer with Diffusion-Based Language Models

Jun 12, 2023

Yiwei Lyu, Tiange Luo, Jiacheng Shi, Todd C. Hollon, Honglak Lee

Figure 1 for Fine-grained Text Style Transfer with Diffusion-Based Language Models

Figure 2 for Fine-grained Text Style Transfer with Diffusion-Based Language Models

Figure 3 for Fine-grained Text Style Transfer with Diffusion-Based Language Models

Figure 4 for Fine-grained Text Style Transfer with Diffusion-Based Language Models

Abstract:Diffusion probabilistic models have shown great success in generating high-quality images controllably, and researchers have tried to utilize this controllability into text generation domain. Previous works on diffusion-based language models have shown that they can be trained without external knowledge (such as pre-trained weights) and still achieve stable performance and controllability. In this paper, we trained a diffusion-based model on StylePTB dataset, the standard benchmark for fine-grained text style transfers. The tasks in StylePTB requires much more refined control over the output text compared to tasks evaluated in previous works, and our model was able to achieve state-of-the-art performance on StylePTB on both individual and compositional transfers. Moreover, our model, trained on limited data from StylePTB without external knowledge, outperforms previous works that utilized pretrained weights, embeddings, and external grammar parsers, and this may indicate that diffusion-based language models have great potential under low-resource settings.

* Accepted at Repl4NLP workshop at ACL 2023

Via

Access Paper or Ask Questions

Discriminator-Guided Multi-step Reasoning with Language Models

May 24, 2023

Muhammad Khalifa, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Lu Wang

Figure 1 for Discriminator-Guided Multi-step Reasoning with Language Models

Figure 2 for Discriminator-Guided Multi-step Reasoning with Language Models

Figure 3 for Discriminator-Guided Multi-step Reasoning with Language Models

Figure 4 for Discriminator-Guided Multi-step Reasoning with Language Models

Abstract:In the context of multi-step reasoning, language models (LMs) probabilities are often miscalibrated -- solutions with high probabilities are not always correct. Therefore, greedy decoding, which is the standard decoding method for reasoning tasks, often yields incorrect solutions. In addition, methods such as self-consistency and verifiers rely on sampling from the LM distribution and do not tackle the underlying issue. To address this, we introduce Guiding Multi-step ReAsoning with a CorrectnEss Discriminator (GRACE), a stepwise decoding approach that nudges the model towards producing correct reasoning steps. GRACE employs a discriminator model, which is trained to differentiate correct steps from invalid ones, to adjust decoding preferences based on the correctness of each reasoning step. Importantly, GRACE does not require fine-tuning or re-training the LMs. When compared with conventional decoding strategies over four popular math reasoning benchmarks, GRACE exhibits significant improvements in both final answer accuracy and step correctness, outperforming both greedy decoding and self-consistency.\footnote{Our code can be found at \url{https://github.com/mukhal/grace.}}

* 19 pages, 7 figures, and 8 tables

Via

Access Paper or Ask Questions

Artificial-intelligence-based molecular classification of diffuse gliomas using rapid, label-free optical imaging

Mar 23, 2023

Todd C. Hollon, Cheng Jiang, Asadur Chowdury, Mustafa Nasir-Moin, Akhil Kondepudi, Alexander Aabedi, Arjun Adapa, Wajd Al-Holou, Jason Heth, Oren Sagher(+15 more)

Abstract:Molecular classification has transformed the management of brain tumors by enabling more accurate prognostication and personalized treatment. However, timely molecular diagnostic testing for patients with brain tumors is limited, complicating surgical and adjuvant treatment and obstructing clinical trial enrollment. In this study, we developed DeepGlioma, a rapid ($< 90$ seconds), artificial-intelligence-based diagnostic screening system to streamline the molecular diagnosis of diffuse gliomas. DeepGlioma is trained using a multimodal dataset that includes stimulated Raman histology (SRH); a rapid, label-free, non-consumptive, optical imaging method; and large-scale, public genomic data. In a prospective, multicenter, international testing cohort of patients with diffuse glioma ($n=153$) who underwent real-time SRH imaging, we demonstrate that DeepGlioma can predict the molecular alterations used by the World Health Organization to define the adult-type diffuse glioma taxonomy (IDH mutation, 1p19q co-deletion and ATRX mutation), achieving a mean molecular classification accuracy of $93.3\pm 1.6\%$. Our results represent how artificial intelligence and optical histology can be used to provide a rapid and scalable adjunct to wet lab methods for the molecular screening of patients with diffuse glioma.

* Paper published in Nature Medicine

Via

Access Paper or Ask Questions

A Picture is Worth a Thousand Words: Language Models Plan from Pixels

Mar 16, 2023

Anthony Z. Liu, Lajanugen Logeswaran, Sungryull Sohn, Honglak Lee

Abstract:Planning is an important capability of artificial agents that perform long-horizon tasks in real-world environments. In this work, we explore the use of pre-trained language models (PLMs) to reason about plan sequences from text instructions in embodied visual environments. Prior PLM based approaches for planning either assume observations are available in the form of text (e.g., provided by a captioning model), reason about plans from the instruction alone, or incorporate information about the visual environment in limited ways (such as a pre-trained affordance function). In contrast, we show that PLMs can accurately plan even when observations are directly encoded as input prompts for the PLM. We show that this simple approach outperforms prior approaches in experiments on the ALFWorld and VirtualHome benchmarks.

Via

Access Paper or Ask Questions

Lightweight feature encoder for wake-up word detection based on self-supervised speech representation

Mar 14, 2023

Hyungjun Lim, Younggwan Kim, Kiho Yeom, Eunjoo Seo, Hoodong Lee, Stanley Jungkyu Choi, Honglak Lee

Figure 1 for Lightweight feature encoder for wake-up word detection based on self-supervised speech representation

Figure 2 for Lightweight feature encoder for wake-up word detection based on self-supervised speech representation

Figure 3 for Lightweight feature encoder for wake-up word detection based on self-supervised speech representation

Figure 4 for Lightweight feature encoder for wake-up word detection based on self-supervised speech representation

Abstract:Self-supervised learning method that provides generalized speech representations has recently received increasing attention. Wav2vec 2.0 is the most famous example, showing remarkable performance in numerous downstream speech processing tasks. Despite its success, it is challenging to use it directly for wake-up word detection on mobile devices due to its expensive computational cost. In this work, we propose LiteFEW, a lightweight feature encoder for wake-up word detection that preserves the inherent ability of wav2vec 2.0 with a minimum scale. In the method, the knowledge of the pre-trained wav2vec 2.0 is compressed by introducing an auto-encoder-based dimensionality reduction technique and distilled to LiteFEW. Experimental results on the open-source "Hey Snips" dataset show that the proposed method applied to various model structures significantly improves the performance, achieving over 20% of relative improvements with only 64k parameters.

* Accepted by ICASSP 2023

Via

Access Paper or Ask Questions