Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yansong Feng

Harder Tasks Need More Experts: Dynamic Routing in MoE Models

Mar 12, 2024

Quzhe Huang, Zhenwei An, Nan Zhuang, Mingxu Tao, Chen Zhang, Yang Jin, Kun Xu, Liwei Chen, Songfang Huang, Yansong Feng

Abstract:In this paper, we introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models, aiming to enhance computational efficiency and model performance by adjusting the number of activated experts based on input difficulty. Unlike traditional MoE approaches that rely on fixed Top-K routing, which activates a predetermined number of experts regardless of the input's complexity, our method dynamically selects experts based on the confidence level in expert selection for each input. This allows for a more efficient utilization of computational resources, activating more experts for complex tasks requiring advanced reasoning and fewer for simpler tasks. Through extensive evaluations, our dynamic routing method demonstrates substantial improvements over conventional Top-2 routing across various benchmarks, achieving an average improvement of 0.7% with less than 90% activated parameters. Further analysis shows our model dispatches more experts to tasks requiring complex reasoning skills, like BBH, confirming its ability to dynamically allocate computational resources in alignment with the input's complexity. Our findings also highlight a variation in the number of experts needed across different layers of the transformer model, offering insights into the potential for designing heterogeneous MoE frameworks. The code and models are available at https://github.com/ZhenweiAn/Dynamic_MoE.

Via

Access Paper or Ask Questions

ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context

Mar 04, 2024

Zirui Wu, Yansong Feng

Figure 1 for ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context

Figure 2 for ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context

Figure 3 for ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context

Figure 4 for ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context

Abstract:Tables play a crucial role in conveying information in various domains, serving as indispensable tools for organizing and presenting data in a structured manner. We propose a Plan-then-Reason framework to answer different types of user queries over tables with sentence context. The framework first plans the reasoning paths over the context, then assigns each step to program-based or textual reasoning to reach the final answer. We construct an instruction tuning set TrixInstruct following the framework. Our dataset cover queries that are program-unsolvable or need combining information from tables and sentences to obtain planning and reasoning abilities. We present ProTrix by finetuning Llama-2-7B on TrixInstruct. Our experiments show that ProTrix generalizes to diverse tabular tasks and achieves comparable performance to GPT-3.5-turbo. We further demonstrate that ProTrix can generate accurate and faithful explanations to answer complex free-form questions. Our work underscores the importance of the planning and reasoning abilities towards a model over tabular tasks with generalizability and interpretability. We will release our dataset and model at https://github.com/WilliamZR/ProTrix.

* under review

Via

Access Paper or Ask Questions

Teaching Large Language Models an Unseen Language on the Fly

Feb 29, 2024

Chen Zhang, Xiao Liu, Jiuheng Lin, Yansong Feng

Abstract:Existing large language models struggle to support numerous low-resource languages, particularly the extremely low-resource ones where there is minimal training data available for effective parameter updating. We thus investigate whether LLMs can learn a new language on the fly solely through prompting. To study this question, we collect a research suite for Zhuang, a language supported by no LLMs currently. We introduce \textsc{DiPMT++}, a framework for adapting LLMs to unseen languages by in-context learning. Using a dictionary and only 5K parallel sentences, \textsc{DiPMT++} significantly enhances the performance of GPT-4 from 0 to 16 BLEU for Chinese-to-Zhuang translation and achieves 32 BLEU for Zhuang-to-Chinese translation. Furthermore, we demonstrate the practical utility of this framework in aiding humans to translate completely unseen languages, which could contribute to the preservation of linguistic diversity.

Via

Access Paper or Ask Questions

Probing Multimodal Large Language Models for Global and Local Semantic Representation

Feb 27, 2024

Mingxu Tao, Quzhe Huang, Kun Xu, Liwei Chen, Yansong Feng, Dongyan Zhao

Figure 1 for Probing Multimodal Large Language Models for Global and Local Semantic Representation

Figure 2 for Probing Multimodal Large Language Models for Global and Local Semantic Representation

Figure 3 for Probing Multimodal Large Language Models for Global and Local Semantic Representation

Figure 4 for Probing Multimodal Large Language Models for Global and Local Semantic Representation

Abstract:The success of large language models has inspired researchers to transfer their exceptional representing ability to other modalities. Several recent works leverage image-caption alignment datasets to train multimodal large language models (MLLMs), which achieve state-of-the-art performance on image-to-text tasks. However, there are very few studies exploring whether MLLMs truly understand the complete image information, i.e., global information, or if they can only capture some local object information. In this study, we find that the intermediate layers of models can encode more global semantic information, whose representation vectors perform better on visual-language entailment tasks, rather than the topmost layers. We further probe models for local semantic representation through object detection tasks. And we draw a conclusion that the topmost layers may excessively focus on local information, leading to a diminished ability to encode global information.

* Accepted by LREC-COLING 2024 as a short paper

Via

Access Paper or Ask Questions

Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data

Feb 27, 2024

Xiao Liu, Zirui Wu, Xueqing Wu, Pan Lu, Kai-Wei Chang, Yansong Feng

Figure 1 for Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data

Figure 2 for Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data

Figure 3 for Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data

Figure 4 for Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data

Abstract:Quantitative reasoning is a critical skill to analyze data, yet the assessment of such ability remains limited. To address this gap, we introduce the Quantitative Reasoning with Data (QRData) benchmark, aiming to evaluate Large Language Models' capability in statistical and causal reasoning with real-world data. The benchmark comprises a carefully constructed dataset of 411 questions accompanied by data sheets from textbooks, online learning materials, and academic papers. To compare models' quantitative reasoning abilities on data and text, we enrich the benchmark with an auxiliary set of 290 text-only questions, namely QRText. We evaluate natural language reasoning, program-based reasoning, and agent reasoning methods including Chain-of-Thought, Program-of-Thoughts, ReAct, and code interpreter assistants on diverse models. The strongest model GPT-4 achieves an accuracy of 58%, which has a large room for improvement. Among open-source models, Deepseek-coder-instruct, a code LLM pretrained on 2T tokens, gets the highest accuracy of 37%. Analysis reveals that models encounter difficulties in data analysis and causal reasoning, and struggle in using causal knowledge and provided data simultaneously. Code and data are in https://github.com/xxxiaol/QRData.

* Project website: https://xxxiaol.github.io/QRData/

Via

Access Paper or Ask Questions

Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering

Feb 26, 2024

Mingxu Tao, Dongyan Zhao, Yansong Feng

Abstract:Open-ended question answering requires models to find appropriate evidence to form well-reasoned, comprehensive and helpful answers. In practical applications, models also need to engage in extended discussions on potential scenarios closely relevant to the question. With augmentation of retrieval module, open-source Large Language Models (LLMs) can produce coherent answers often with different focuses, but are still sub-optimal in terms of reliable evidence selection and in-depth question analysis. In this paper, we propose a novel Chain-of-Discussion framework to leverage the synergy among multiple open-source LLMs aiming to provide \textbf{more correct} and \textbf{more comprehensive} answers for open-ended QA, although they are not strong enough individually. Our experiments show that discussions among multiple LLMs play a vital role in enhancing the quality of answers. We release our data and code at \url{https://github.com/kobayashikanna01/Chain-of-Discussion}.

* Under review

Via

Access Paper or Ask Questions

CASA: Causality-driven Argument Sufficiency Assessment

Jan 10, 2024

Xiao Liu, Yansong Feng, Kai-Wei Chang

Abstract:The argument sufficiency assessment task aims to determine if the premises of a given argument support its conclusion. To tackle this task, existing works often train a classifier on data annotated by humans. However, annotating data is laborious, and annotations are often inconsistent due to subjective criteria. Motivated by the probability of sufficiency (PS) definition in the causal literature, we propose CASA, a zero-shot causality-driven argument sufficiency assessment framework. PS measures how likely introducing the premise event would lead to the conclusion, when both the premise and conclusion events are absent. To estimate this probability, we propose to use large language models (LLMs) to generate contexts that are inconsistent with the premise and conclusion, and revise them by injecting the premise event. Experiments on two logical fallacy detection datasets demonstrate that CASA accurately identifies insufficient arguments. We further deploy CASA in a writing assistance application, and find that suggestions generated by CASA enhance the sufficiency of student-written arguments. Code and data are available at https://github.com/xxxiaol/CASA.

* Project website: https://xxxiaol.github.io/CASA/

Via

Access Paper or Ask Questions

MC^2: A Multilingual Corpus of Minority Languages in China

Nov 14, 2023

Chen Zhang, Mingxu Tao, Quzhe Huang, Jiuheng Lin, Zhibin Chen, Yansong Feng

Figure 1 for MC^2: A Multilingual Corpus of Minority Languages in China

Figure 2 for MC^2: A Multilingual Corpus of Minority Languages in China

Figure 3 for MC^2: A Multilingual Corpus of Minority Languages in China

Figure 4 for MC^2: A Multilingual Corpus of Minority Languages in China

Abstract:Large-scale corpora play a vital role in the construction of large language models (LLMs). However, existing LLMs exhibit limited abilities in understanding low-resource languages, including the minority languages in China, due to a lack of training data. To improve the accessibility of these languages, we present MC^2, a Multilingual Corpus of Minority Languages in China, which is the largest open-source corpus so far. It encompasses four underrepresented languages, i.e., Tibetan, Uyghur, Kazakh in the Kazakh Arabic script, and Mongolian in the traditional Mongolian script. Notably, two writing systems in MC^2 are long neglected in previous corpora. As we identify serious contamination in the low-resource language split in the existing multilingual corpora, we propose a quality-centric solution for collecting MC^2, prioritizing quality and accuracy while enhancing representativeness and diversity. By in-depth analysis, we demonstrate the new research challenges MC^2 brings, such as long-text modeling and multiplicity of writing systems. We hope MC^2 can help enhance the equity of the underrepresented languages in China and provide a reliable data foundation for further research on low-resource languages.

* Work in progress

Via

Access Paper or Ask Questions

From the One, Judge of the Whole: Typed Entailment Graph Construction with Predicate Generation

Jun 07, 2023

Zhibin Chen, Yansong Feng, Dongyan Zhao

Abstract:Entailment Graphs (EGs) have been constructed based on extracted corpora as a strong and explainable form to indicate context-independent entailment relations in natural languages. However, EGs built by previous methods often suffer from the severe sparsity issues, due to limited corpora available and the long-tail phenomenon of predicate distributions. In this paper, we propose a multi-stage method, Typed Predicate-Entailment Graph Generator (TP-EGG), to tackle this problem. Given several seed predicates, TP-EGG builds the graphs by generating new predicates and detecting entailment relations among them. The generative nature of TP-EGG helps us leverage the recent advances from large pretrained language models (PLMs), while avoiding the reliance on carefully prepared corpora. Experiments on benchmark datasets show that TP-EGG can generate high-quality and scale-controllable entailment graphs, achieving significant in-domain improvement over state-of-the-art EGs and boosting the performance of down-stream inference tasks.

* 9 pages, 3 figures, accepted to ACL 2023

Via

Access Paper or Ask Questions

How Many Answers Should I Give? An Empirical Study of Multi-Answer Reading Comprehension

Jun 01, 2023

Chen Zhang, Jiuheng Lin, Xiao Liu, Yuxuan Lai, Yansong Feng, Dongyan Zhao

Abstract:The multi-answer phenomenon, where a question may have multiple answers scattered in the document, can be well handled by humans but is challenging enough for machine reading comprehension (MRC) systems. Despite recent progress in multi-answer MRC, there lacks a systematic analysis of how this phenomenon arises and how to better address it. In this work, we design a taxonomy to categorize commonly-seen multi-answer MRC instances, with which we inspect three multi-answer datasets and analyze where the multi-answer challenge comes from. We further analyze how well different paradigms of current multi-answer MRC models deal with different types of multi-answer instances. We find that some paradigms capture well the key information in the questions while others better model the relationship between questions and contexts. We thus explore strategies to make the best of the strengths of different paradigms. Experiments show that generation models can be a promising platform to incorporate different paradigms. Our annotations and code are released for further research.

* Findings of ACL 2023

Via

Access Paper or Ask Questions