Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Minlie Huang

Acceleration of Federated Learning with Alleviated Forgetting in Local Training

Mar 05, 2022
Chencheng Xu, Zhiwei Hong, Minlie Huang, Tao Jiang

Figure 1 for Acceleration of Federated Learning with Alleviated Forgetting in Local Training

Figure 2 for Acceleration of Federated Learning with Alleviated Forgetting in Local Training

Figure 3 for Acceleration of Federated Learning with Alleviated Forgetting in Local Training

Figure 4 for Acceleration of Federated Learning with Alleviated Forgetting in Local Training

Federated learning (FL) enables distributed optimization of machine learning models while protecting privacy by independently training local models on each client and then aggregating parameters on a central server, thereby producing an effective global model. Although a variety of FL algorithms have been proposed, their training efficiency remains low when the data are not independently and identically distributed (non-i.i.d.) across different clients. We observe that the slow convergence rates of the existing methods are (at least partially) caused by the catastrophic forgetting issue during the local training stage on each individual client, which leads to a large increase in the loss function concerning the previous training data at the other clients. Here, we propose FedReg, an algorithm to accelerate FL with alleviated knowledge forgetting in the local training stage by regularizing locally trained parameters with the loss on generated pseudo data, which encode the knowledge of previous training data learned by the global model. Our comprehensive experiments demonstrate that FedReg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep and the clients' data are extremely non-i.i.d., but is also able to protect privacy better in classification problems and more robust against gradient inversion attacks. The code is available at: https://github.com/Zoesgithub/FedReg.

* In International Conference on Learning Representations (2021, Sept)

Via

Access Paper or Ask Questions

Rethinking and Refining the Distinct Metric

Feb 28, 2022
Siyang Liu, Sahand Sabour, Yinhe Zheng, Pei Ke, Xiaoyan Zhu, Minlie Huang

Figure 1 for Rethinking and Refining the Distinct Metric

Figure 2 for Rethinking and Refining the Distinct Metric

Figure 3 for Rethinking and Refining the Distinct Metric

Figure 4 for Rethinking and Refining the Distinct Metric

Distinct is a widely used automatic metric for evaluating the diversity of language generation tasks. However, we observe that the original approach to calculating distinct scores has evident biases that tend to add higher penalties to longer sequences. In this paper, we refine the calculation of distinct scores by re-scaling the number of distinct tokens based on its expectation. We provide both empirical and theoretical evidence to show that our method effectively removes the biases exhibited in the original distinct score. Further analyses also demonstrate that the refined score correlates better with human evaluations.

* 4 pages, to be published at ACL2022

Via

Access Paper or Ask Questions

AugESC: Large-scale Data Augmentation for Emotional Support Conversation with Pre-trained Language Models

Feb 26, 2022
Chujie Zheng, Sahand Sabour, Jiaxin Wen, Minlie Huang

Figure 1 for AugESC: Large-scale Data Augmentation for Emotional Support Conversation with Pre-trained Language Models

Figure 2 for AugESC: Large-scale Data Augmentation for Emotional Support Conversation with Pre-trained Language Models

Figure 3 for AugESC: Large-scale Data Augmentation for Emotional Support Conversation with Pre-trained Language Models

Figure 4 for AugESC: Large-scale Data Augmentation for Emotional Support Conversation with Pre-trained Language Models

Crowd-sourcing is commonly adopted for dialog data collection. However, it is highly costly and time-consuming, and the collected data is limited in scale and topic coverage. In this paper, aiming to generate emotional support conversations, we propose exploiting large-scale pre-trained language models for data augmentation, and provide key findings in our pilot exploration. Our adopted approach leverages the 6B-parameter GPT-J model and utilizes publicly available dialog posts to trigger conversations on various topics. Then we construct AugESC, a machine-augmented dataset for emotional support conversation. It is two orders of magnitude larger than the original ESConv dataset in scale, covers more diverse topics, and is shown to be of high quality by human evaluation. Lastly, we demonstrate with interactive evaluation that AugESC can further enhance dialog models tuned on ESConv to handle various conversation topics and to provide significantly more effective emotional support.

* Work in progress

Via

Access Paper or Ask Questions

Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks

Feb 16, 2022
Jingyan Zhou, Jiawen Deng, Fei Mi, Yitong Li, Yasheng Wang, Minlie Huang, Xin Jiang, Qun Liu, Helen Meng

Figure 1 for Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks

Figure 2 for Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks

Figure 3 for Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks

Figure 4 for Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks

The research of open-domain dialog systems has been greatly prospered by neural models trained on large-scale corpora, however, such corpora often introduce various safety problems (e.g., offensive languages, biases, and toxic behaviors) that significantly hinder the deployment of dialog systems in practice. Among all these unsafe issues, addressing social bias is more complex as its negative impact on marginalized populations is usually expressed implicitly, thus requiring normative reasoning and rigorous analysis. In this paper, we focus our investigation on social bias detection of dialog safety problems. We first propose a novel Dial-Bias Frame for analyzing the social bias in conversations pragmatically, which considers more comprehensive bias-related analyses rather than simple dichotomy annotations. Based on the proposed framework, we further introduce CDail-Bias Dataset that, to our knowledge, is the first well-annotated Chinese social bias dialog dataset. In addition, we establish several dialog bias detection benchmarks at different label granularities and input types (utterance-level and context-level). We show that the proposed in-depth analyses together with these benchmarks in our Dial-Bias Frame are necessary and essential to bias detection tasks and can benefit building safe dialog systems in practice.

Via

Access Paper or Ask Questions

Youling: an AI-Assisted Lyrics Creation System

Jan 18, 2022
Rongsheng Zhang, Xiaoxi Mao, Le Li, Lin Jiang, Lin Chen, Zhiwei Hu, Yadong Xi, Changjie Fan, Minlie Huang

Figure 1 for Youling: an AI-Assisted Lyrics Creation System

Figure 2 for Youling: an AI-Assisted Lyrics Creation System

Figure 3 for Youling: an AI-Assisted Lyrics Creation System

Figure 4 for Youling: an AI-Assisted Lyrics Creation System

Recently, a variety of neural models have been proposed for lyrics generation. However, most previous work completes the generation process in a single pass with little human intervention. We believe that lyrics creation is a creative process with human intelligence centered. AI should play a role as an assistant in the lyrics creation process, where human interactions are crucial for high-quality creation. This paper demonstrates \textit{Youling}, an AI-assisted lyrics creation system, designed to collaborate with music creators. In the lyrics generation process, \textit{Youling} supports traditional one pass full-text generation mode as well as an interactive generation mode, which allows users to select the satisfactory sentences from generated candidates conditioned on preceding context. The system also provides a revision module which enables users to revise undesired sentences or words of lyrics repeatedly. Besides, \textit{Youling} allows users to use multifaceted attributes to control the content and format of generated lyrics. The demo video of the system is available at https://youtu.be/DFeNpHk0pm4.

* accept by emnlp2020 demo track

Via

Access Paper or Ask Questions

COLD: A Benchmark for Chinese Offensive Language Detection

Jan 16, 2022
Jiawen Deng, Jingyan Zhou, Hao Sun, Fei Mi, Minlie Huang

Figure 1 for COLD: A Benchmark for Chinese Offensive Language Detection

Figure 2 for COLD: A Benchmark for Chinese Offensive Language Detection

Figure 3 for COLD: A Benchmark for Chinese Offensive Language Detection

Figure 4 for COLD: A Benchmark for Chinese Offensive Language Detection

Offensive language detection and prevention becomes increasing critical for maintaining a healthy social platform and the safe deployment of language models. Despite plentiful researches on toxic and offensive language problem in NLP, existing studies mainly focus on English, while few researches involve Chinese due to the limitation of resources. To facilitate Chinese offensive language detection and model evaluation, we collect COLDataset, a Chinese offensive language dataset containing 37k annotated sentences. With this high-quality dataset, we provide a strong baseline classifier, COLDetector, with 81% accuracy for offensive language detection. Furthermore, we also utilize the proposed \textsc{COLDetector} to study output offensiveness of popular Chinese language models (CDialGPT and CPM). We find that (1) CPM tends to generate more offensive output than CDialGPT, and (2) certain type of prompts, like anti-bias sentences, can trigger offensive outputs more easily.Altogether, our resources and analyses are intended to help detoxify the Chinese online communities and evaluate the safety performance of generative language models. Disclaimer: The paper contains example data that may be considered profane, vulgar, or offensive.

* 10 pages

Via

Access Paper or Ask Questions

CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark

Dec 27, 2021
Yuan Yao, Qingxiu Dong, Jian Guan, Boxi Cao, Zhengyan Zhang, Chaojun Xiao, Xiaozhi Wang, Fanchao Qi, Junwei Bao, Jinran Nie, Zheni Zeng, Yuxian Gu, Kun Zhou, Xuancheng Huang, Wenhao Li, Shuhuai Ren, Jinliang Lu, Chengqiang Xu, Huadong Wang, Guoyang Zeng, Zile Zhou, Jiajun Zhang, Juanzi Li, Minlie Huang, Rui Yan, Xiaodong He, Xiaojun Wan, Xin Zhao, Xu Sun, Yang Liu, Zhiyuan Liu, Xianpei Han, Erhong Yang, Zhifang Sui, Maosong Sun

Figure 1 for CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark

Figure 2 for CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark

Figure 3 for CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark

Figure 4 for CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark

Realizing general-purpose language intelligence has been a longstanding goal for natural language processing, where standard evaluation benchmarks play a fundamental and guiding role. We argue that for general-purpose language intelligence evaluation, the benchmark itself needs to be comprehensive and systematic. To this end, we propose CUGE, a Chinese Language Understanding and Generation Evaluation benchmark with the following features: (1) Hierarchical benchmark framework, where datasets are principally selected and organized with a language capability-task-dataset hierarchy. (2) Multi-level scoring strategy, where different levels of model performance are provided based on the hierarchical framework. To facilitate CUGE, we provide a public leaderboard that can be customized to support flexible model judging criteria. Evaluation results on representative pre-trained language models indicate ample room for improvement towards general-purpose language intelligence. CUGE is publicly available at cuge.baai.ac.cn.

Via

Access Paper or Ask Questions

Unsupervised Domain Adaptation with Adapter

Nov 01, 2021
Rongsheng Zhang, Yinhe Zheng, Xiaoxi Mao, Minlie Huang

Figure 1 for Unsupervised Domain Adaptation with Adapter

Figure 2 for Unsupervised Domain Adaptation with Adapter

Figure 3 for Unsupervised Domain Adaptation with Adapter

Figure 4 for Unsupervised Domain Adaptation with Adapter

Unsupervised domain adaptation (UDA) with pre-trained language models (PrLM) has achieved promising results since these pre-trained models embed generic knowledge learned from various domains. However, fine-tuning all the parameters of the PrLM on a small domain-specific corpus distort the learned generic knowledge, and it is also expensive to deployment a whole fine-tuned PrLM for each domain. This paper explores an adapter-based fine-tuning approach for unsupervised domain adaptation. Specifically, several trainable adapter modules are inserted in a PrLM, and the embedded generic knowledge is preserved by fixing the parameters of the original PrLM at fine-tuning. A domain-fusion scheme is introduced to train these adapters using a mix-domain corpus to better capture transferable features. Elaborated experiments on two benchmark datasets are carried out, and the results demonstrate that our approach is effective with different tasks, dataset sizes, and domain similarities.

* Accepted by NeurIPS2021 workshop

Via

Access Paper or Ask Questions

Tackling Multi-Answer Open-Domain Questions via a Recall-then-Verify Framework

Oct 16, 2021
Zhihong Shao, Minlie Huang

Figure 1 for Tackling Multi-Answer Open-Domain Questions via a Recall-then-Verify Framework

Figure 2 for Tackling Multi-Answer Open-Domain Questions via a Recall-then-Verify Framework

Figure 3 for Tackling Multi-Answer Open-Domain Questions via a Recall-then-Verify Framework

Figure 4 for Tackling Multi-Answer Open-Domain Questions via a Recall-then-Verify Framework

Open domain questions are likely to be open-ended and ambiguous, leading to multiple valid answers. Existing approaches typically adopt the rerank-then-read framework, where a reader reads top-ranking evidence to predict answers. According to our empirical analyses, this framework is faced with three problems: to leverage the power of a large reader, the reranker is forced to select only a few relevant passages that cover diverse answers, which is non-trivial due to unknown effect on the reader's performance; the small reading budget also prevents the reader from making use of valuable retrieved evidence filtered out by the reranker; besides, as the reader generates predictions all at once based on all selected evidence, it may learn pathological dependencies among answers, i.e., whether to predict an answer may also depend on evidence of the other answers. To avoid these problems, we propose to tackle multi-answer open-domain questions with a recall-then-verify framework, which separates the reasoning process of each answer so that we can make better use of retrieved evidence while also leveraging the power of large models under the same memory constraint. Our framework achieves new state-of-the-art results on two multi-answer datasets, and predicts significantly more gold answers than a rerank-then-read system with an oracle reranker.

Via

Access Paper or Ask Questions

On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark

Oct 16, 2021
Hao Sun, Guangxuan Xu, Jiawen Deng, Jiale Cheng, Chujie Zheng, Hao Zhou, Nanyun Peng, Xiaoyan Zhu, Minlie Huang

Figure 1 for On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark

Figure 2 for On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark

Figure 3 for On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark

Figure 4 for On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark

Dialogue safety problems severely limit the real-world deployment of neural conversational models and attract great research interests recently. We propose a taxonomy for dialogue safety specifically designed to capture unsafe behaviors that are unique in human-bot dialogue setting, with focuses on context-sensitive unsafety, which is under-explored in prior works. To spur research in this direction, we compile DiaSafety, a dataset of 6 unsafe categories with rich context-sensitive unsafe examples. Experiments show that existing utterance-level safety guarding tools fail catastrophically on our dataset. As a remedy, we train a context-level dialogue safety classifier to provide a strong baseline for context-sensitive dialogue unsafety detection. With our classifier, we perform safety evaluations on popular conversational models and show that existing dialogue systems are still stuck in context-sensitive safety problems.

Via

Access Paper or Ask Questions