Alert button
Picture for Jiaao Chen

Jiaao Chen

Alert button

Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models

Aug 14, 2023
Jiaao Chen, Xiaoman Pan, Dian Yu, Kaiqiang Song, Xiaoyang Wang, Dong Yu, Jianshu Chen

Figure 1 for Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models
Figure 2 for Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models
Figure 3 for Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models
Figure 4 for Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models

We consider the problem of eliciting compositional generalization capabilities in large language models (LLMs) with a novel type of prompting strategy. Compositional generalization empowers the LLMs to solve problems that are harder than the ones they have seen (i.e., easy-to-hard generalization), which is a critical reasoning capability of human-like intelligence. However, even the current state-of-the-art LLMs still struggle with this form of reasoning. To bridge this gap, we propose skills-in-context (SKiC) prompting, which instructs LLMs how to compose basic skills to resolve more complex problems. We find that it is crucial to demonstrate both the skills and the compositional examples within the same prompting context. With as few as two examplars, our SKiC prompting initiates strong synergies between skills and their composition capabilities. Notably, it empowers LLMs to solve unseen problems that require innovative skill compositions, achieving near-perfect generalization on a broad range of challenging compositionality tasks. Intriguingly, SKiC prompting unlocks the latent potential of LLMs, enabling them to leverage pre-existing internal skills acquired during earlier pre-training stages, even when these skills are not explicitly presented in the prompting context. This results in the capability of LLMs to solve unseen complex problems by activating and composing internal competencies. With such prominent features, SKiC prompting is able to achieve state-of-the-art performance on challenging mathematical reasoning benchmarks (e.g., MATH).

Viaarxiv icon

Informative Path Planning of Autonomous Vehicle for Parking Occupancy Estimation

Aug 01, 2023
Yunze Hu, Jiaao Chen, Kangjie Zhou, Han Gao, Yutong Li, Chang Liu

Figure 1 for Informative Path Planning of Autonomous Vehicle for Parking Occupancy Estimation
Figure 2 for Informative Path Planning of Autonomous Vehicle for Parking Occupancy Estimation
Figure 3 for Informative Path Planning of Autonomous Vehicle for Parking Occupancy Estimation
Figure 4 for Informative Path Planning of Autonomous Vehicle for Parking Occupancy Estimation

Parking occupancy estimation holds significant potential in facilitating parking resource management and mitigating traffic congestion. Existing approaches employ robotic systems to detect the occupancy status of individual parking spaces and primarily focus on enhancing detection accuracy through perception pipelines. However, these methods often overlook the crucial aspect of robot path planning, which can hinder the accurate estimation of the entire parking area. In light of these limitations, we introduce the problem of informative path planning for parking occupancy estimation using autonomous vehicles and formulate it as a Partially Observable Markov Decision Process (POMDP) task. Then, we develop an occupancy state transition model and introduce a Bayes filter to estimate occupancy based on noisy sensor measurements. Subsequently, we propose the Monte Carlo Bayes Filter Tree, a computationally efficient algorithm that leverages progressive widening to generate informative paths. We demonstrate that the proposed approach outperforms the benchmark methods in diverse simulation environments, effectively striking a balance between optimality and computational efficiency.

* Extended version of publication in ITSC 2023 
Viaarxiv icon

Can Large Language Models Transform Computational Social Science?

Apr 12, 2023
Caleb Ziems, William Held, Omar Shaikh, Jiaao Chen, Zhehao Zhang, Diyi Yang

Figure 1 for Can Large Language Models Transform Computational Social Science?
Figure 2 for Can Large Language Models Transform Computational Social Science?
Figure 3 for Can Large Language Models Transform Computational Social Science?
Figure 4 for Can Large Language Models Transform Computational Social Science?

Large Language Models (LLMs) like ChatGPT are capable of successfully performing many language processing tasks zero-shot (without the need for training data). If this capacity also applies to the coding of social phenomena like persuasiveness and political ideology, then LLMs could effectively transform Computational Social Science (CSS). This work provides a road map for using LLMs as CSS tools. Towards this end, we contribute a set of prompting best practices and an extensive evaluation pipeline to measure the zero-shot performance of 13 language models on 24 representative CSS benchmarks. On taxonomic labeling tasks (classification), LLMs fail to outperform the best fine-tuned models but still achieve fair levels of agreement with humans. On free-form coding tasks (generation), LLMs produce explanations that often exceed the quality of crowdworkers' gold references. We conclude that today's LLMs can radically augment the CSS research pipeline in two ways: (1) serving as zero-shot data annotators on human annotation teams, and (2) bootstrapping challenging creative generation tasks (e.g., explaining the hidden meaning behind text). In summary, LLMs can significantly reduce costs and increase efficiency of social science analysis in partnership with humans.

Viaarxiv icon

A Cheaper and Better Diffusion Language Model with Soft-Masked Noise

Apr 10, 2023
Jiaao Chen, Aston Zhang, Mu Li, Alex Smola, Diyi Yang

Figure 1 for A Cheaper and Better Diffusion Language Model with Soft-Masked Noise
Figure 2 for A Cheaper and Better Diffusion Language Model with Soft-Masked Noise
Figure 3 for A Cheaper and Better Diffusion Language Model with Soft-Masked Noise
Figure 4 for A Cheaper and Better Diffusion Language Model with Soft-Masked Noise

Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation. Whereas, as a way inherently built for continuous data, existing diffusion models still have some limitations in modeling discrete data, e.g., languages. For example, the generally used Gaussian noise can not handle the discrete corruption well, and the objectives in continuous spaces fail to be stable for textual data in the diffusion process especially when the dimension is high. To alleviate these issues, we introduce a novel diffusion model for language modeling, Masked-Diffuse LM, with lower training cost and better performances, inspired by linguistic features in languages. Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data. Also, we directly predict the categorical distribution with cross-entropy loss function in every diffusion step to connect the continuous space and discrete space in a more efficient and straightforward way. Through experiments on 5 controlled generation tasks, we demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.

* Code is available at https://github.com/amazon-science/masked-diffusion-lm 
Viaarxiv icon

Is ChatGPT a General-Purpose Natural Language Processing Task Solver?

Feb 15, 2023
Chengwei Qin, Aston Zhang, Zhuosheng Zhang, Jiaao Chen, Michihiro Yasunaga, Diyi Yang

Figure 1 for Is ChatGPT a General-Purpose Natural Language Processing Task Solver?
Figure 2 for Is ChatGPT a General-Purpose Natural Language Processing Task Solver?
Figure 3 for Is ChatGPT a General-Purpose Natural Language Processing Task Solver?
Figure 4 for Is ChatGPT a General-Purpose Natural Language Processing Task Solver?

Spurred by advancements in scale, large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot -- i.e., without adaptation on downstream data. Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community due to the fact that it can generate high-quality responses to human input and self-correct previous mistakes based on subsequent conversations. However, it is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot. In this work, we empirically analyze the zero-shot learning ability of ChatGPT by evaluating it on 20 popular NLP datasets covering 7 representative task categories. With extensive empirical studies, we demonstrate both the effectiveness and limitations of the current version of ChatGPT. We find that ChatGPT performs well on many tasks favoring reasoning capabilities (e.g., arithmetic reasoning) while it still faces challenges when solving specific tasks such as sequence tagging. We additionally provide in-depth analysis through qualitative case studies.

Viaarxiv icon

Parameter-Efficient Fine-Tuning Design Spaces

Jan 04, 2023
Jiaao Chen, Aston Zhang, Xingjian Shi, Mu Li, Alex Smola, Diyi Yang

Figure 1 for Parameter-Efficient Fine-Tuning Design Spaces
Figure 2 for Parameter-Efficient Fine-Tuning Design Spaces
Figure 3 for Parameter-Efficient Fine-Tuning Design Spaces
Figure 4 for Parameter-Efficient Fine-Tuning Design Spaces

Parameter-efficient fine-tuning aims to achieve performance comparable to fine-tuning, using fewer trainable parameters. Several strategies (e.g., Adapters, prefix tuning, BitFit, and LoRA) have been proposed. However, their designs are hand-crafted separately, and it remains unclear whether certain design patterns exist for parameter-efficient fine-tuning. Thus, we present a parameter-efficient fine-tuning design paradigm and discover design patterns that are applicable to different experimental settings. Instead of focusing on designing another individual tuning strategy, we introduce parameter-efficient fine-tuning design spaces that parameterize tuning structures and tuning strategies. Specifically, any design space is characterized by four components: layer grouping, trainable parameter allocation, tunable groups, and strategy assignment. Starting from an initial design space, we progressively refine the space based on the model quality of each design choice and make greedy selection at each stage over these four components. We discover the following design patterns: (i) group layers in a spindle pattern; (ii) allocate the number of trainable parameters to layers uniformly; (iii) tune all the groups; (iv) assign proper tuning strategies to different groups. These design patterns result in new parameter-efficient fine-tuning methods. We show experimentally that these methods consistently and significantly outperform investigated parameter-efficient fine-tuning strategies across different backbone models and different tasks in natural language processing.

* Code is available at https://github.com/amazon-science/peft-design-spaces 
Viaarxiv icon

Human-in-the-loop Abstractive Dialogue Summarization

Dec 19, 2022
Jiaao Chen, Mohan Dodda, Diyi Yang

Figure 1 for Human-in-the-loop Abstractive Dialogue Summarization
Figure 2 for Human-in-the-loop Abstractive Dialogue Summarization
Figure 3 for Human-in-the-loop Abstractive Dialogue Summarization
Figure 4 for Human-in-the-loop Abstractive Dialogue Summarization

Abstractive dialogue summarization has received increasing attention recently. Despite the fact that most of the current dialogue summarization systems are trained to maximize the likelihood of human-written summaries and have achieved significant results, there is still a huge gap in generating high-quality summaries as determined by humans, such as coherence and faithfulness, partly due to the misalignment in maximizing a single human-written summary. To this end, we propose to incorporate different levels of human feedback into the training process. This will enable us to guide the models to capture the behaviors humans care about for summaries. Specifically, we ask humans to highlight the salient information to be included in summaries to provide the local feedback , and to make overall comparisons among summaries in terms of coherence, accuracy, coverage, concise and overall quality, as the global feedback. We then combine both local and global feedback to fine-tune the dialog summarization policy with Reinforcement Learning. Experiments conducted on multiple datasets demonstrate the effectiveness and generalization of our methods over the state-of-the-art supervised baselines, especially in terms of human judgments.

Viaarxiv icon

WHEN FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model for Financial Domain

Oct 31, 2022
Raj Sanjay Shah, Kunal Chawla, Dheeraj Eidnani, Agam Shah, Wendi Du, Sudheer Chava, Natraj Raman, Charese Smiley, Jiaao Chen, Diyi Yang

Figure 1 for WHEN FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model for Financial Domain
Figure 2 for WHEN FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model for Financial Domain
Figure 3 for WHEN FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model for Financial Domain
Figure 4 for WHEN FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model for Financial Domain

Pre-trained language models have shown impressive performance on a variety of tasks and domains. Previous research on financial language models usually employs a generic training scheme to train standard model architectures, without completely leveraging the richness of the financial data. We propose a novel domain specific Financial LANGuage model (FLANG) which uses financial keywords and phrases for better masking, together with span boundary objective and in-filing objective. Additionally, the evaluation benchmarks in the field have been limited. To this end, we contribute the Financial Language Understanding Evaluation (FLUE), an open-source comprehensive suite of benchmarks for the financial domain. These include new benchmarks across 5 NLP tasks in financial domain as well as common benchmarks used in the previous research. Experiments on these benchmarks suggest that our model outperforms those in prior literature on a variety of NLP tasks. Our models, code and benchmark data are publicly available on Github and Huggingface.

Viaarxiv icon