Alert button
Picture for Zhou Yu

Zhou Yu

Alert button

ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer

Sep 04, 2023
Zachary Horvitz, Ajay Patel, Chris Callison-Burch, Zhou Yu, Kathleen McKeown

Figure 1 for ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer
Figure 2 for ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer
Figure 3 for ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer
Figure 4 for ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer

Textual style transfer is the task of transforming stylistic properties of text while preserving meaning. Target "styles" can be defined in numerous ways, ranging from single attributes (e.g, formality) to authorship (e.g, Shakespeare). Previous unsupervised style-transfer approaches generally rely on significant amounts of labeled data for only a fixed set of styles or require large language models. In contrast, we introduce a novel diffusion-based framework for general-purpose style transfer that can be flexibly adapted to arbitrary target styles at inference time. Our parameter-efficient approach, ParaGuide, leverages paraphrase-conditioned diffusion models alongside gradient-based guidance from both off-the-shelf classifiers and strong existing style embedders to transform the style of text while preserving semantic information. We validate the method on the Enron Email Corpus, with both human and automatic evaluations, and find that it outperforms strong baselines on formality, sentiment, and even authorship style transfer.

Viaarxiv icon

MultiPA: a multi-task speech pronunciation assessment system for a closed and open response scenario

Aug 24, 2023
Yu-Wen Chen, Zhou Yu, Julia Hirschberg

The design of automatic speech pronunciation assessment can be categorized into closed and open response scenarios, each with strengths and limitations. A system with the ability to function in both scenarios can cater to diverse learning needs and provide a more precise and holistic assessment of pronunciation skills. In this study, we propose a Multi-task Pronunciation Assessment model called MultiPA. MultiPA provides an alternative to Kaldi-based systems in that it has simpler format requirements and better compatibility with other neural network models. Compared with previous open response systems, MultiPA provides a wider range of evaluations, encompassing assessments at both the sentence and word-level. Our experimental results show that MultiPA achieves comparable performance when working in closed response scenarios and maintains more robust performance when directly used for open responses.

Viaarxiv icon

DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI

Jul 20, 2023
Jianguo Zhang, Kun Qian, Zhiwei Liu, Shelby Heinecke, Rui Meng, Ye Liu, Zhou Yu, Huan Wang, Silvio Savarese, Caiming Xiong

Figure 1 for DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI
Figure 2 for DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI
Figure 3 for DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI
Figure 4 for DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI

Despite advancements in conversational AI, language models encounter challenges to handle diverse conversational tasks, and existing dialogue dataset collections often lack diversity and comprehensiveness. To tackle these issues, we introduce DialogStudio: the largest and most diverse collection of dialogue datasets, unified under a consistent format while preserving their original information. Our collection encompasses data from open-domain dialogues, task-oriented dialogues, natural language understanding, conversational recommendation, dialogue summarization, and knowledge-grounded dialogues, making it an incredibly rich and diverse resource for dialogue research and model training. To further enhance the utility of DialogStudio, we identify the licenses for each dataset and design domain-aware prompts for selected dialogues to facilitate instruction-aware fine-tuning. Furthermore, we develop conversational AI models using the dataset collection, and our experiments in both zero-shot and few-shot learning scenarios demonstrate the superiority of DialogStudio. To improve transparency and support dataset and task-based research, as well as language model pre-training, all datasets, licenses, codes, and models associated with DialogStudio are made publicly accessible at https://github.com/salesforce/DialogStudio

Viaarxiv icon

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

Jul 17, 2023
Yanda Chen, Ruiqi Zhong, Narutatsu Ri, Chen Zhao, He He, Jacob Steinhardt, Zhou Yu, Kathleen McKeown

Figure 1 for Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Figure 2 for Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Figure 3 for Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Figure 4 for Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

Large language models (LLMs) are trained to imitate humans to explain human decisions. However, do LLMs explain themselves? Can they help humans build mental models of how LLMs process different inputs? To answer these questions, we propose to evaluate $\textbf{counterfactual simulatability}$ of natural language explanations: whether an explanation can enable humans to precisely infer the model's outputs on diverse counterfactuals of the explained input. For example, if a model answers "yes" to the input question "Can eagles fly?" with the explanation "all birds can fly", then humans would infer from the explanation that it would also answer "yes" to the counterfactual input "Can penguins fly?". If the explanation is precise, then the model's answer should match humans' expectations. We implemented two metrics based on counterfactual simulatability: precision and generality. We generated diverse counterfactuals automatically using LLMs. We then used these metrics to evaluate state-of-the-art LLMs (e.g., GPT-4) on two tasks: multi-hop factual reasoning and reward modeling. We found that LLM's explanations have low precision and that precision does not correlate with plausibility. Therefore, naively optimizing human approvals (e.g., RLHF) may not be a sufficient solution.

Viaarxiv icon

Filter Pruning for Efficient CNNs via Knowledge-driven Differential Filter Sampler

Jul 01, 2023
Shaohui Lin, Wenxuan Huang, Jiao Xie, Baochang Zhang, Yunhang Shen, Zhou Yu, Jungong Han, David Doermann

Figure 1 for Filter Pruning for Efficient CNNs via Knowledge-driven Differential Filter Sampler
Figure 2 for Filter Pruning for Efficient CNNs via Knowledge-driven Differential Filter Sampler
Figure 3 for Filter Pruning for Efficient CNNs via Knowledge-driven Differential Filter Sampler
Figure 4 for Filter Pruning for Efficient CNNs via Knowledge-driven Differential Filter Sampler

Filter pruning simultaneously accelerates the computation and reduces the memory overhead of CNNs, which can be effectively applied to edge devices and cloud services. In this paper, we propose a novel Knowledge-driven Differential Filter Sampler~(KDFS) with Masked Filter Modeling~(MFM) framework for filter pruning, which globally prunes the redundant filters based on the prior knowledge of a pre-trained model in a differential and non-alternative optimization. Specifically, we design a differential sampler with learnable sampling parameters to build a binary mask vector for each layer, determining whether the corresponding filters are redundant. To learn the mask, we introduce masked filter modeling to construct PCA-like knowledge by aligning the intermediate features from the pre-trained teacher model and the outputs of the student decoder taking sampling features as the input. The mask and sampler are directly optimized by the Gumbel-Softmax Straight-Through Gradient Estimator in an end-to-end manner in combination with global pruning constraint, MFM reconstruction error, and dark knowledge. Extensive experiments demonstrate the proposed KDFS's effectiveness in compressing the base models on various datasets. For instance, the pruned ResNet-50 on ImageNet achieves $55.36\%$ computation reduction, and $42.86\%$ parameter reduction, while only dropping $0.35\%$ Top-1 accuracy, significantly outperforming the state-of-the-art methods. The code is available at \url{https://github.com/Osilly/KDFS}.

Viaarxiv icon

IdEALS: Idiomatic Expressions for Advancement of Language Skills

May 24, 2023
Narutatsu Ri, Bill Sun, Sam Davidson, Zhou Yu

Figure 1 for IdEALS: Idiomatic Expressions for Advancement of Language Skills
Figure 2 for IdEALS: Idiomatic Expressions for Advancement of Language Skills
Figure 3 for IdEALS: Idiomatic Expressions for Advancement of Language Skills
Figure 4 for IdEALS: Idiomatic Expressions for Advancement of Language Skills

Although significant progress has been made in developing methods for Grammatical Error Correction (GEC), addressing word choice improvements has been notably lacking and enhancing sentence expressivity by replacing phrases with advanced expressions is an understudied aspect. In this paper, we focus on this area and present our investigation into the task of incorporating the usage of idiomatic expressions in student writing. To facilitate our study, we curate extensive training sets and expert-annotated testing sets using real-world data and evaluate various approaches and compare their performance against human experts.

Viaarxiv icon

Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment

May 23, 2023
Sky CH-Wang, Arkadiy Saakyan, Oliver Li, Zhou Yu, Smaranda Muresan

Figure 1 for Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment
Figure 2 for Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment
Figure 3 for Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment
Figure 4 for Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment

Designing systems that can reason across cultures requires that they are grounded in the norms of the contexts in which they operate. However, current research on developing computational models of social norms has primarily focused on American society. Here, we propose a novel approach to discover and compare descriptive social norms across Chinese and American cultures. We demonstrate our approach by leveraging discussions on a Chinese Q&A platform-Zhihu-and the existing SocialChemistry dataset as proxies for contrasting cultural axes, align social situations cross-culturally, and extract social norms from texts using in-context learning. Embedding Chain-of-Thought prompting in a human-AI collaborative framework, we build a high-quality dataset of 3,069 social norms aligned with social situations across Chinese and American cultures alongside corresponding free-text explanations. To test the ability of models to reason about social norms across cultures, we introduce the task of explainable social norm entailment, showing that existing models under 3B parameters have significant room for improvement in both automatic and human evaluation. Further analysis of cross-cultural norm differences based on our dataset shows empirical alignment with the social orientations framework, revealing several situational and descriptive nuances in norms across these cultures.

Viaarxiv icon

Using Textual Interface to Align External Knowledge for End-to-End Task-Oriented Dialogue Systems

May 23, 2023
Qingyang Wu, Deema Alnuhait, Derek Chen, Zhou Yu

Figure 1 for Using Textual Interface to Align External Knowledge for End-to-End Task-Oriented Dialogue Systems
Figure 2 for Using Textual Interface to Align External Knowledge for End-to-End Task-Oriented Dialogue Systems
Figure 3 for Using Textual Interface to Align External Knowledge for End-to-End Task-Oriented Dialogue Systems
Figure 4 for Using Textual Interface to Align External Knowledge for End-to-End Task-Oriented Dialogue Systems

Traditional end-to-end task-oriented dialogue systems have been built with a modularized design. However, such design often causes misalignment between the agent response and external knowledge, due to inadequate representation of information. Furthermore, its evaluation metrics emphasize assessing the agent's pre-lexicalization response, neglecting the quality of the completed response. In this work, we propose a novel paradigm that uses a textual interface to align external knowledge and eliminate redundant processes. We demonstrate our paradigm in practice through MultiWOZ-Remake, including an interactive textual interface built for the MultiWOZ database and a correspondingly re-processed dataset. We train an end-to-end dialogue system to evaluate this new dataset. The experimental results show that our approach generates more natural final responses and achieves a greater task success rate compared to the previous models.

Viaarxiv icon

Prompt-Based Monte-Carlo Tree Search for Goal-Oriented Dialogue Policy Planning

May 23, 2023
Xiao Yu, Maximillian Chen, Zhou Yu

Figure 1 for Prompt-Based Monte-Carlo Tree Search for Goal-Oriented Dialogue Policy Planning
Figure 2 for Prompt-Based Monte-Carlo Tree Search for Goal-Oriented Dialogue Policy Planning
Figure 3 for Prompt-Based Monte-Carlo Tree Search for Goal-Oriented Dialogue Policy Planning
Figure 4 for Prompt-Based Monte-Carlo Tree Search for Goal-Oriented Dialogue Policy Planning

Planning for goal-oriented dialogue often requires simulating future dialogue interactions and estimating task progress. Many approaches thus consider training neural networks to perform look-ahead search algorithms such as A* search and Monte Carlo Tree Search (MCTS). However, this training often require abundant annotated data, which creates challenges when faced with noisy annotations or low-resource settings. We introduce GDP-Zero, an approach using Open-Loop MCTS to perform goal-oriented dialogue policy planning without any model training. GDP-Zero prompts a large language model to act as a policy prior, value function, user simulator, and system model during the tree search. We evaluate GDP-Zero on the goal-oriented task PersuasionForGood, and find that its responses are preferred over ChatGPT up to 59.32% of the time, and are rated more persuasive than ChatGPT during interactive evaluations.

Viaarxiv icon