Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hao He

Logical Negation Augmenting and Debiasing for Prompt-based Methods

May 08, 2024

Yitian Li, Jidong Tian, Hao He, Yaohui Jin

Figure 1 for Logical Negation Augmenting and Debiasing for Prompt-based Methods

Figure 2 for Logical Negation Augmenting and Debiasing for Prompt-based Methods

Figure 3 for Logical Negation Augmenting and Debiasing for Prompt-based Methods

Figure 4 for Logical Negation Augmenting and Debiasing for Prompt-based Methods

Abstract:Prompt-based methods have gained increasing attention on NLP and shown validity on many downstream tasks. Many works have focused on mining these methods' potential for knowledge extraction, but few explore their ability to make logical reasoning. In this work, we focus on the effectiveness of the prompt-based methods on first-order logical reasoning and find that the bottleneck lies in logical negation. Based on our analysis, logical negation tends to result in spurious correlations to negative answers, while propositions without logical negation correlate to positive answers. To solve the problem, we propose a simple but effective method, Negation Augmenting and Negation Debiasing (NAND), which introduces negative propositions to prompt-based methods without updating parameters. Specifically, these negative propositions can counteract spurious correlations by providing "not" for all instances so that models cannot make decisions only by whether expressions contain a logical negation. Experiments on three datasets show that NAND not only solves the problem of calibrating logical negation but also significantly enhances prompt-based methods of logical reasoning without model retraining.

Via

Access Paper or Ask Questions

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

Apr 02, 2024

Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo Dai, Hongsheng Li, Ceyuan Yang

Abstract:Controllability plays a crucial role in video generation since it allows users to create desired content. However, existing models largely overlooked the precise control of camera pose that serves as a cinematic language to express deeper narrative nuances. To alleviate this issue, we introduce CameraCtrl, enabling accurate camera pose control for text-to-video(T2V) models. After precisely parameterizing the camera trajectory, a plug-and-play camera module is then trained on a T2V model, leaving others untouched. Additionally, a comprehensive study on the effect of various datasets is also conducted, suggesting that videos with diverse camera distribution and similar appearances indeed enhance controllability and generalization. Experimental results demonstrate the effectiveness of CameraCtrl in achieving precise and domain-adaptive camera control, marking a step forward in the pursuit of dynamic and customized video storytelling from textual and camera pose inputs. Our project website is at: https://hehao13.github.io/projects-CameraCtrl/.

* Project page: https://hehao13.github.io/projects-CameraCtrl/ Code: https://github.com/hehao13/CameraCtrl

Via

Access Paper or Ask Questions

Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models

Feb 24, 2024

Haoran Liao, Jidong Tian, Shaohua Hu, Hao He, Yaohui Jin

Figure 1 for Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models

Figure 2 for Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models

Figure 3 for Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models

Figure 4 for Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models

Abstract:Large language models~(LLMs) have exhibited impressive performance across NLP tasks. So far they still face challenges in complex reasoning tasks and can be sensitive to input context. Despite significant efforts have been invested in enhancing reasoning process and improving prefix-prompts robustness, the crucial role of problem context has been overlooked. In this study, we propose a new approach to improve the mathematical capacities of LLMs, named Problem Elaboration Prompting~(PEP). Specifically, PEP decomposes and elucidates the problem context before reasoning, thus enhancing the global context modeling and reducing the parsing difficulties. Experiments on datasets demonstrate promising performances on complex reasoning and indicate the beneficial impact for ill-formed problems. For instance, with the GPT-3.5 model~(\texttt{text-davinci-003}), we observed a 9.93\% improvement with greedy decoding and 8.80\% improvement with self-consistency on GSM8k compared to the standard CoT. With ChatGPT~(\texttt{turbo}) and PEP, we achieve SOTA performances on SVAMP with 86.2\% and GSM8k with 90.98\%.

Via

Access Paper or Ask Questions

Understanding LLMs: A Comprehensive Overview from Training to Inference

Jan 06, 2024

Yiheng Liu, Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong(+11 more)

Figure 1 for Understanding LLMs: A Comprehensive Overview from Training to Inference

Figure 2 for Understanding LLMs: A Comprehensive Overview from Training to Inference

Figure 3 for Understanding LLMs: A Comprehensive Overview from Training to Inference

Figure 4 for Understanding LLMs: A Comprehensive Overview from Training to Inference

Abstract:The introduction of ChatGPT has led to a significant increase in the utilization of Large Language Models (LLMs) for addressing downstream tasks. There's an increasing focus on cost-efficient training and deployment within this context. Low-cost training and deployment of LLMs represent the future development trend. This paper reviews the evolution of large language model training techniques and inference deployment technologies aligned with this emerging trend. The discussion on training includes various aspects, including data preprocessing, training architecture, pre-training tasks, parallel training, and relevant content related to model fine-tuning. On the inference side, the paper covers topics such as model compression, parallel computation, memory scheduling, and structural optimization. It also explores LLMs' utilization and provides insights into their future development.

* 30 pages,6 figures

Via

Access Paper or Ask Questions

Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent

Dec 17, 2023

Haoran Liao, Qinyi Du, Shaohua Hu, Hao He, Yanyan Xu, Jidong Tian, Yaohui Jin

Figure 1 for Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent

Figure 2 for Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent

Figure 3 for Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent

Figure 4 for Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent

Abstract:Large language models (LLMs) face challenges in solving complex mathematical problems that require comprehensive capacities to parse the statements, associate domain knowledge, perform compound logical reasoning, and integrate the intermediate rationales. Tackling all these problems once could be arduous for LLMs, thus leading to confusion in generation. In this work, we explore the potential of enhancing LLMs with agents by meticulous decomposition and modeling of mathematical reasoning process. Specifically, we propose a formal description of the mathematical solving and extend LLMs with an agent-based zero-shot framework named $\bf{P}$lanner-$\bf{R}$easoner-$\bf{E}$xecutor-$\bf{R}$eflector (PRER). We further provide and implement two MathAgents that define the logical forms and inherent relations via a pool of actions in different grains and orientations: MathAgent-M adapts its actions to LLMs, while MathAgent-H aligns with humankind. Experiments on miniF2F and MATH have demonstrated the effectiveness of PRER and proposed MathAgents, achieving an increase of $12.3\%$($53.9\%\xrightarrow{}66.2\%$) on the MiniF2F, $9.2\%$ ($49.8\%\xrightarrow{}59.0\%$) on MATH, and $13.2\%$($23.2\%\xrightarrow{}35.4\%$) for level-5 problems of MATH against GPT-4. Further analytical results provide more insightful perspectives on exploiting the behaviors of LLMs as agents.

* There are unfair comparisons on miniF2F. This will be fixed in the future

Via

Access Paper or Ask Questions

Comparable Demonstrations are Important in In-Context Learning: A Novel Perspective on Demonstration Selection

Dec 12, 2023

Caoyun Fan, Jidong Tian, Yitian Li, Hao He, Yaohui Jin

Figure 1 for Comparable Demonstrations are Important in In-Context Learning: A Novel Perspective on Demonstration Selection

Figure 2 for Comparable Demonstrations are Important in In-Context Learning: A Novel Perspective on Demonstration Selection

Figure 3 for Comparable Demonstrations are Important in In-Context Learning: A Novel Perspective on Demonstration Selection

Figure 4 for Comparable Demonstrations are Important in In-Context Learning: A Novel Perspective on Demonstration Selection

Abstract:In-Context Learning (ICL) is an important paradigm for adapting Large Language Models (LLMs) to downstream tasks through a few demonstrations. Despite the great success of ICL, the limitation of the demonstration number may lead to demonstration bias, i.e. the input-label mapping induced by LLMs misunderstands the task's essence. Inspired by human experience, we attempt to mitigate such bias through the perspective of the inter-demonstration relationship. Specifically, we construct Comparable Demonstrations (CDs) by minimally editing the texts to flip the corresponding labels, in order to highlight the task's essence and eliminate potential spurious correlations through the inter-demonstration comparison. Through a series of experiments on CDs, we find that (1) demonstration bias does exist in LLMs, and CDs can significantly reduce such bias; (2) CDs exhibit good performance in ICL, especially in out-of-distribution scenarios. In summary, this study explores the ICL mechanisms from a novel perspective, providing a deeper insight into the demonstration selection strategy for ICL.

* ICASSP 2024

Via

Access Paper or Ask Questions

Can Large Language Models Serve as Rational Players in Game Theory? A Systematic Analysis

Dec 12, 2023

Caoyun Fan, Jindou Chen, Yaohui Jin, Hao He

Figure 1 for Can Large Language Models Serve as Rational Players in Game Theory? A Systematic Analysis

Figure 2 for Can Large Language Models Serve as Rational Players in Game Theory? A Systematic Analysis

Figure 3 for Can Large Language Models Serve as Rational Players in Game Theory? A Systematic Analysis

Figure 4 for Can Large Language Models Serve as Rational Players in Game Theory? A Systematic Analysis

Abstract:Game theory, as an analytical tool, is frequently utilized to analyze human behavior in social science research. With the high alignment between the behavior of Large Language Models (LLMs) and humans, a promising research direction is to employ LLMs as substitutes for humans in game experiments, enabling social science research. However, despite numerous empirical researches on the combination of LLMs and game theory, the capability boundaries of LLMs in game theory remain unclear. In this research, we endeavor to systematically analyze LLMs in the context of game theory. Specifically, rationality, as the fundamental principle of game theory, serves as the metric for evaluating players' behavior -- building a clear desire, refining belief about uncertainty, and taking optimal actions. Accordingly, we select three classical games (dictator game, Rock-Paper-Scissors, and ring-network game) to analyze to what extent LLMs can achieve rationality in these three aspects. The experimental results indicate that even the current state-of-the-art LLM (GPT-4) exhibits substantial disparities compared to humans in game theory. For instance, LLMs struggle to build desires based on uncommon preferences, fail to refine belief from many simple patterns, and may overlook or modify refined belief when taking actions. Therefore, we consider that introducing LLMs into game experiments in the field of social science should be approached with greater caution.

* AAAI 2024

Via

Access Paper or Ask Questions

Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

Oct 29, 2023

Fei Zhang, Tianfei Zhou, Boyang Li, Hao He, Chaofan Ma, Tianjiao Zhang, Jiangchao Yao, Ya Zhang, Yanfeng Wang

Figure 1 for Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

Figure 2 for Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

Figure 3 for Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

Figure 4 for Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

Abstract:This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs. Existing works turn to enhance the vanilla vision transformer by introducing explicit grouping recognition, i.e., employing several group tokens/centroids to cluster the image tokens and perform the group-text alignment. Nevertheless, these methods suffer from a granularity inconsistency regarding the usage of group tokens, which are aligned in the all-to-one v.s. one-to-one manners during the training and inference phases, respectively. We argue that this discrepancy arises from the lack of elaborate supervision for each group token. To bridge this granularity gap, this paper explores explicit supervision for the group tokens from the prototypical knowledge. To this end, this paper proposes the non-learnable prototypical regularization (NPR) where non-learnable prototypes are estimated from source features to serve as supervision and enable contrastive matching of the group tokens. This regularization encourages the group tokens to segment objects with less redundancy and capture more comprehensive semantic regions, leading to increased compactness and richness. Based on NPR, we propose the prototypical guidance segmentation network (PGSeg) that incorporates multi-modal regularization by leveraging prototypical sources from both images and texts at different levels, progressively enhancing the segmentation capability with diverse prototypical patterns. Experimental results show that our proposed method achieves state-of-the-art performance on several benchmark datasets. The source code is available at https://github.com/Ferenas/PGSeg.

* 14 pages, Accept in NeurIPS 2023

Via

Access Paper or Ask Questions

Chain-of-Thought Tuning: Masked Language Models can also Think Step By Step in Natural Language Understanding

Oct 18, 2023

Caoyun Fan, Jidong Tian, Yitian Li, Wenqing Chen, Hao He, Yaohui Jin

Figure 1 for Chain-of-Thought Tuning: Masked Language Models can also Think Step By Step in Natural Language Understanding

Figure 2 for Chain-of-Thought Tuning: Masked Language Models can also Think Step By Step in Natural Language Understanding

Figure 3 for Chain-of-Thought Tuning: Masked Language Models can also Think Step By Step in Natural Language Understanding

Figure 4 for Chain-of-Thought Tuning: Masked Language Models can also Think Step By Step in Natural Language Understanding

Abstract:Chain-of-Thought (CoT) is a technique that guides Large Language Models (LLMs) to decompose complex tasks into multi-step reasoning through intermediate steps in natural language form. Briefly, CoT enables LLMs to think step by step. However, although many Natural Language Understanding (NLU) tasks also require thinking step by step, LLMs perform less well than small-scale Masked Language Models (MLMs). To migrate CoT from LLMs to MLMs, we propose Chain-of-Thought Tuning (CoTT), a two-step reasoning framework based on prompt tuning, to implement step-by-step thinking for MLMs on NLU tasks. From the perspective of CoT, CoTT's two-step framework enables MLMs to implement task decomposition; CoTT's prompt tuning allows intermediate steps to be used in natural language form. Thereby, the success of CoT can be extended to NLU tasks through MLMs. To verify the effectiveness of CoTT, we conduct experiments on two NLU tasks: hierarchical classification and relation extraction, and the results show that CoTT outperforms baselines and achieves state-of-the-art performance.

* EMNLP2023 Main Conference

Via

Access Paper or Ask Questions

Accurate Use of Label Dependency in Multi-Label Text Classification Through the Lens of Causality

Oct 11, 2023

Caoyun Fan, Wenqing Chen, Jidong Tian, Yitian Li, Hao He, Yaohui Jin

Abstract:Multi-Label Text Classification (MLTC) aims to assign the most relevant labels to each given text. Existing methods demonstrate that label dependency can help to improve the model's performance. However, the introduction of label dependency may cause the model to suffer from unwanted prediction bias. In this study, we attribute the bias to the model's misuse of label dependency, i.e., the model tends to utilize the correlation shortcut in label dependency rather than fusing text information and label dependency for prediction. Motivated by causal inference, we propose a CounterFactual Text Classifier (CFTC) to eliminate the correlation bias, and make causality-based predictions. Specifically, our CFTC first adopts the predict-then-modify backbone to extract precise label information embedded in label dependency, then blocks the correlation shortcut through the counterfactual de-bias technique with the help of the human causal graph. Experimental results on three datasets demonstrate that our CFTC significantly outperforms the baselines and effectively eliminates the correlation bias in datasets.

* Applied Intelligence 2023

Via

Access Paper or Ask Questions