Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ashish Sabharwal

Shammie

Transformers Implement First-Order Logic with Majority Quantifiers

Oct 06, 2022

William Merrill, Ashish Sabharwal

Figure 1 for Transformers Implement First-Order Logic with Majority Quantifiers

Abstract:Characterizing the implicit structure of the computation within neural networks is a foundational problem in the area of deep learning interpretability. Can their inner decision process be captured symbolically in some familiar logic? We show that any transformer neural network can be translated into an equivalent fixed-size first-order logic formula which may also use majority quantifiers. The idea is to simulate transformers with highly uniform threshold circuits and leverage known theoretical connections between circuits and logic. Our findings also reveal the surprising fact that the entire transformer computation can be reduced merely to the division of two (large) integers. While our results are most pertinent for transformers, they apply equally to a broader class of neural network architectures, namely those with a fixed-depth uniform computation graph made up of standard neural net components, which includes feedforward and convolutional networks.

Via

Access Paper or Ask Questions

Decomposed Prompting: A Modular Approach for Solving Complex Tasks

Oct 05, 2022

Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal

Figure 1 for Decomposed Prompting: A Modular Approach for Solving Complex Tasks

Figure 2 for Decomposed Prompting: A Modular Approach for Solving Complex Tasks

Figure 3 for Decomposed Prompting: A Modular Approach for Solving Complex Tasks

Figure 4 for Decomposed Prompting: A Modular Approach for Solving Complex Tasks

Abstract:Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn, especially when embedded in more complex tasks. To address this, we propose Decomposed Prompting, a new approach to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks that can be delegated to a library of prompting-based LLMs dedicated to these sub-tasks. This modular structure allows each prompt to be optimized for its specific sub-task, further decomposed if necessary, and even easily replaced with more effective prompts, trained models, or symbolic functions if desired. We show that the flexibility and modularity of Decomposed Prompting allows it to outperform prior work on few-shot prompting using GPT3. On symbolic reasoning tasks, we can further decompose sub-tasks that are hard for LLMs into even simpler solvable sub-tasks. When the complexity comes from the input length, we can recursively decompose the task into the same task but with smaller inputs. We also evaluate our approach on textual multi-step reasoning tasks: on long-context multi-hop QA task, we can more effectively teach the sub-tasks via our separate sub-tasks prompts; and on open-domain multi-hop QA, we can incorporate a symbolic information retrieval within our decomposition framework, leading to improved performance on both tasks.

* Datasets, Code and Prompts available at https://github.com/allenai/DecomP

Via

Access Paper or Ask Questions

Complexity-Based Prompting for Multi-Step Reasoning

Oct 03, 2022

Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, Tushar Khot

Figure 1 for Complexity-Based Prompting for Multi-Step Reasoning

Figure 2 for Complexity-Based Prompting for Multi-Step Reasoning

Figure 3 for Complexity-Based Prompting for Multi-Step Reasoning

Figure 4 for Complexity-Based Prompting for Multi-Step Reasoning

Abstract:We study the task of prompting large-scale language models to perform multi-step reasoning. Existing work shows that when prompted with a chain of thoughts (CoT), sequences of short sentences describing intermediate reasoning steps towards a final answer, large language models can generate new reasoning chains and predict answers for new inputs. A central question is which reasoning examples make the most effective prompts. In this work, we propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning. We show that prompts with higher reasoning complexity, i.e., chains with more reasoning steps, achieve substantially better performance on math word reasoning tasks over strong baselines. We further extend our complexity-based criteria from prompting (selecting inputs) to decoding (selecting outputs), where we sample multiple reasoning chains from the model, then choose the majority of generated answers from complex reasoning chains (over simple chains). When used to prompt GPT-3, our approach substantially improves multi-step reasoning accuracy, with an 8.6% absolute improvement on GSM8K, and 6.4% on MathQA. Compared with existing example selection schemes like manual tuning or retrieval-based selection, selection based on reasoning complexity is intuitive, easy to implement, and annotation-efficient. Further results demonstrate the robustness of our methods under format perturbation and distribution shift.

* Preprint

Via

Access Paper or Ask Questions

Log-Precision Transformers are Constant-Depth Uniform Threshold Circuits

Jul 02, 2022

William Merrill, Ashish Sabharwal

Abstract:We prove that transformer neural networks with logarithmic precision in the input length (and where the feedforward subnetworks are computable using linear space in their input length) can be simulated by constant-depth uniform threshold circuits. Thus, such transformers only recognize formal languages in $\mathsf{TC}^0$, the class of languages defined by constant-depth, poly-size threshold circuits. This demonstrates a connection between a practical claim in NLP and a theoretical conjecture in computational complexity theory: "attention is all you need" (Vaswani et al., 2017), i.e., transformers are capable of all efficient computation, only if all efficiently computable problems can be solved with log space, i.e., $\mathsf L = \mathsf P$. We also construct a transformer that can evaluate any constant-depth threshold circuit on any input, proving that transformers can follow instructions that are representable in $\mathsf{TC}^0$.

* Preprint

Via

Access Paper or Ask Questions

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Jun 10, 2022

Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso(+435 more)

Abstract:Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 442 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.

* 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Via

Access Paper or Ask Questions

Teaching Broad Reasoning Skills via Decomposition-Guided Contexts

May 25, 2022

Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal

Figure 1 for Teaching Broad Reasoning Skills via Decomposition-Guided Contexts

Figure 2 for Teaching Broad Reasoning Skills via Decomposition-Guided Contexts

Figure 3 for Teaching Broad Reasoning Skills via Decomposition-Guided Contexts

Figure 4 for Teaching Broad Reasoning Skills via Decomposition-Guided Contexts

Abstract:Question-answering datasets require a broad set of reasoning skills. We show how to use question decompositions to teach language models these broad reasoning skills in a robust fashion. Specifically, we use widely available QDMR representations to programmatically create synthetic contexts for real questions in six multihop reasoning datasets. These contexts are carefully designed to avoid common reasoning shortcuts prevalent in real contexts that prevent models from learning the right skills. This results in a pretraining dataset, named TeaBReaC, containing 525K multihop questions (with associated formal programs) covering about 900 reasoning patterns. We show that pretraining standard language models (LMs) on TeaBReaC before fine-tuning them on target datasets improves their performance by up to 13 EM points across 3 multihop QA datasets, with a 30 point gain on more complex questions. The resulting models also demonstrate higher robustness, with a 6-11 point improvement on two contrast sets. Furthermore, TeaBReaC pretraining substantially improves model performance and robustness even when starting with numeracy-aware LMs pretrained using recent methods (e.g., PReasM). Our work thus shows how one can effectively use decomposition-guided contexts to robustly teach multihop reasoning.

Via

Access Paper or Ask Questions

Better Retrieval May Not Lead to Better Question Answering

May 07, 2022

Zhengzhong Liang, Tushar Khot, Steven Bethard, Mihai Surdeanu, Ashish Sabharwal

Figure 1 for Better Retrieval May Not Lead to Better Question Answering

Figure 2 for Better Retrieval May Not Lead to Better Question Answering

Figure 3 for Better Retrieval May Not Lead to Better Question Answering

Figure 4 for Better Retrieval May Not Lead to Better Question Answering

Abstract:Considerable progress has been made recently in open-domain question answering (QA) problems, which require Information Retrieval (IR) and Reading Comprehension (RC). A popular approach to improve the system's performance is to improve the quality of the retrieved context from the IR stage. In this work we show that for StrategyQA, a challenging open-domain QA dataset that requires multi-hop reasoning, this common approach is surprisingly ineffective -- improving the quality of the retrieved context hardly improves the system's performance. We further analyze the system's behavior to identify potential reasons.

* 10 pages

Via

Access Paper or Ask Questions

What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment

Apr 19, 2022

Matthew Finlayson, Kyle Richardson, Ashish Sabharwal, Peter Clark

Figure 1 for What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment

Figure 2 for What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment

Figure 3 for What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment

Figure 4 for What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment

Abstract:The instruction learning paradigm -- where a model learns to perform new tasks from task descriptions alone -- has become popular in general-purpose model research. The capabilities of large transformer models as instruction learners, however, remain poorly understood. We use a controlled synthetic environment to characterize such capabilities. Specifically, we use the task of deciding whether a given string matches a regular expression (viewed as an instruction) to identify properties of tasks, instructions, and instances that make instruction learning challenging. For instance, we find that our model, a fine-tuned T5-based text2text transformer, struggles with large regular languages, suggesting that less precise instructions are challenging for models. Additionally, instruction executions that require tracking longer contexts of prior steps are also more difficult. We use our findings to systematically construct a challenging instruction learning dataset, which we call Hard RegSet. Fine-tuning on Hard RegSet, our large transformer learns to correctly interpret only 65.6% of test instructions (with at least 90% accuracy), and 11%-24% of the instructions in out-of-distribution generalization settings. We propose Hard RegSet as a challenging instruction learning task, and a controlled environment for studying instruction learning.

Via

Access Paper or Ask Questions

Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability

Dec 16, 2021

Kyle Richardson, Ashish Sabharwal

Figure 1 for Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability

Figure 2 for Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability

Figure 3 for Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability

Figure 4 for Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability

Abstract:Investigating the reasoning abilities of transformer models, and discovering new challenging tasks for them, has been a topic of much interest. Recent studies have found these models to be surprisingly strong at performing deductive reasoning over formal logical theories expressed in natural language. A shortcoming of these studies, however, is that they do not take into account that logical theories, when sampled uniformly at random, do not necessarily lead to hard instances. We propose a new methodology for creating challenging algorithmic reasoning datasets that focus on natural language satisfiability (NLSat) problems. The key idea is to draw insights from empirical sampling of hard propositional SAT problems and from complexity-theoretic studies of language. This methodology allows us to distinguish easy from hard instances, and to systematically increase the complexity of existing reasoning benchmarks such as RuleTaker. We find that current transformers, given sufficient training data, are surprisingly robust at solving the resulting NLSat problems of substantially increased difficulty. They also exhibit some degree of scale-invariance - the ability to generalize to problems of larger size and scope. Our results, however, reveal important limitations too: a careful sampling of training data is crucial for building models that generalize to larger problems, and transformer models' limited scale-invariance suggests they are far from learning robust deductive reasoning algorithms.

* Accepted to AAAI-2022, AAAI preprint

Via

Access Paper or Ask Questions

PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts

Dec 15, 2021

Daniel Khashabi, Shane Lyu, Sewon Min, Lianhui Qin, Kyle Richardson, Sameer Singh, Sean Welleck, Hannaneh Hajishirzi, Tushar Khot, Ashish Sabharwal(+1 more)

Figure 1 for PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts

Figure 2 for PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts

Figure 3 for PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts

Figure 4 for PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts

Abstract:Fine-tuning continuous prompts for target tasks has recently emerged as a compact alternative to full model fine-tuning. Motivated by these promising results, we investigate the feasibility of extracting a discrete (textual) interpretation of continuous prompts that is faithful to the problem they solve. In practice, we observe a "wayward" behavior between the task solved by continuous prompts and their nearest neighbor discrete projections: We can find continuous prompts that solve a task while being projected to an arbitrary text (e.g., definition of a different or even a contradictory task), while being within a very small (2%) margin of the best continuous prompt of the same size for the task. We provide intuitions behind this odd and surprising behavior, as well as extensive empirical analyses quantifying the effect of various parameters. For instance, for larger model sizes we observe higher waywardness, i.e, we can find prompts that more closely map to any arbitrary text with a smaller drop in accuracy. These findings have important implications relating to the difficulty of faithfully interpreting continuous prompts and their generalization across models and tasks, providing guidance for future progress in prompting language models.

* Work in Progress

Via

Access Paper or Ask Questions