Alert button
Picture for Benjamin Yao

Benjamin Yao

Alert button

PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer

Jun 13, 2023
Xu Han, Bin Guo, Yoon Jung, Benjamin Yao, Yu Zhang, Xiaohu Liu, Chenlei Guo

Figure 1 for PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer
Figure 2 for PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer
Figure 3 for PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer
Figure 4 for PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer

Personalized dialogue agents (DAs) powered by large pre-trained language models (PLMs) often rely on explicit persona descriptions to maintain personality consistency. However, such descriptions may not always be available or may pose privacy concerns. To tackle this bottleneck, we introduce PersonaPKT, a lightweight transfer learning approach that can build persona-consistent dialogue models without explicit persona descriptions. By representing each persona as a continuous vector, PersonaPKT learns implicit persona-specific features directly from a small number of dialogue samples produced by the same persona, adding less than 0.1% trainable parameters for each persona on top of the PLM backbone. Empirical results demonstrate that PersonaPKT effectively builds personalized DAs with high storage efficiency, outperforming various baselines in terms of persona consistency while maintaining good response generation quality. In addition, it enhances privacy protection by avoiding explicit persona descriptions. Overall, PersonaPKT is an effective solution for creating personalized DAs that respect user privacy.

* 10 pages, 3 figures, accepted to SustaiNLP 2023 
Viaarxiv icon

KEPLET: Knowledge-Enhanced Pretrained Language Model with Topic Entity Awareness

May 02, 2023
Yichuan Li, Jialong Han, Kyumin Lee, Chengyuan Ma, Benjamin Yao, Derek Liu

Figure 1 for KEPLET: Knowledge-Enhanced Pretrained Language Model with Topic Entity Awareness
Figure 2 for KEPLET: Knowledge-Enhanced Pretrained Language Model with Topic Entity Awareness
Figure 3 for KEPLET: Knowledge-Enhanced Pretrained Language Model with Topic Entity Awareness
Figure 4 for KEPLET: Knowledge-Enhanced Pretrained Language Model with Topic Entity Awareness

In recent years, Pre-trained Language Models (PLMs) have shown their superiority by pre-training on unstructured text corpus and then fine-tuning on downstream tasks. On entity-rich textual resources like Wikipedia, Knowledge-Enhanced PLMs (KEPLMs) incorporate the interactions between tokens and mentioned entities in pre-training, and are thus more effective on entity-centric tasks such as entity linking and relation classification. Although exploiting Wikipedia's rich structures to some extent, conventional KEPLMs still neglect a unique layout of the corpus where each Wikipedia page is around a topic entity (identified by the page URL and shown in the page title). In this paper, we demonstrate that KEPLMs without incorporating the topic entities will lead to insufficient entity interaction and biased (relation) word semantics. We thus propose KEPLET, a novel Knowledge-Enhanced Pre-trained LanguagE model with Topic entity awareness. In an end-to-end manner, KEPLET identifies where to add the topic entity's information in a Wikipedia sentence, fuses such information into token and mentioned entities representations, and supervises the network learning, through which it takes topic entities back into consideration. Experiments demonstrated the generality and superiority of KEPLET which was applied to two representative KEPLMs, achieving significant improvements on four entity-centric tasks.

Viaarxiv icon

Large-scale Hybrid Approach for Predicting User Satisfaction with Conversational Agents

May 29, 2020
Dookun Park, Hao Yuan, Dongmin Kim, Yinglei Zhang, Matsoukas Spyros, Young-Bum Kim, Ruhi Sarikaya, Edward Guo, Yuan Ling, Kevin Quinn, Pham Hung, Benjamin Yao, Sungjin Lee

Figure 1 for Large-scale Hybrid Approach for Predicting User Satisfaction with Conversational Agents
Figure 2 for Large-scale Hybrid Approach for Predicting User Satisfaction with Conversational Agents
Figure 3 for Large-scale Hybrid Approach for Predicting User Satisfaction with Conversational Agents
Figure 4 for Large-scale Hybrid Approach for Predicting User Satisfaction with Conversational Agents

Measuring user satisfaction level is a challenging task, and a critical component in developing large-scale conversational agent systems serving the needs of real users. An widely used approach to tackle this is to collect human annotation data and use them for evaluation or modeling. Human annotation based approaches are easier to control, but hard to scale. A novel alternative approach is to collect user's direct feedback via a feedback elicitation system embedded to the conversational agent system, and use the collected user feedback to train a machine-learned model for generalization. User feedback is the best proxy for user satisfaction, but is not available for some ineligible intents and certain situations. Thus, these two types of approaches are complementary to each other. In this work, we tackle the user satisfaction assessment problem with a hybrid approach that fuses explicit user feedback, user satisfaction predictions inferred by two machine-learned models, one trained on user feedback data and the other human annotation data. The hybrid approach is based on a waterfall policy, and the experimental results with Amazon Alexa's large-scale datasets show significant improvements in inferring user satisfaction. A detailed hybrid architecture, an in-depth analysis on user feedback data, and an algorithm that generates data sets to properly simulate the live traffic are presented in this paper.

Viaarxiv icon

Knowledge Distillation from Internal Representations

Oct 08, 2019
Gustavo Aguilar, Yuan Ling, Yu Zhang, Benjamin Yao, Xing Fan, Edward Guo

Figure 1 for Knowledge Distillation from Internal Representations
Figure 2 for Knowledge Distillation from Internal Representations
Figure 3 for Knowledge Distillation from Internal Representations
Figure 4 for Knowledge Distillation from Internal Representations

Knowledge distillation is typically conducted by training a small model (the student) to mimic a large and cumbersome model (the teacher). The idea is to compress the knowledge from the teacher by using its output probabilities as soft-labels to optimize the student. However, when the teacher is considerably large, there is no guarantee that the internal knowledge of the teacher will be transferred into the student; even if the student closely matches the soft-labels, its internal representations may be considerably different. This internal mismatch can undermine the generalization capabilities originally intended to be transferred from the teacher to the student. In this paper, we propose to distill the internal representations of a large model such as BERT into a simplified version of it. We formulate two ways to distill such representations and various algorithms to conduct the distillation. We experiment with datasets from the GLUE benchmark and consistently show that adding knowledge distillation from internal representations is a more powerful method than only using soft-label distillation.

* Under review 
Viaarxiv icon