Alert button
Picture for Yuchi Zhang

Yuchi Zhang

Alert button

Towards Personalized Review Summarization by Modeling Historical Reviews from Customer and Product Separately

Jan 27, 2023
Xin Cheng, Shen Gao, Yuchi Zhang, Yongliang Wang, Xiuying Chen, Mingzhe Li, Dongyan Zhao, Rui Yan

Figure 1 for Towards Personalized Review Summarization by Modeling Historical Reviews from Customer and Product Separately
Figure 2 for Towards Personalized Review Summarization by Modeling Historical Reviews from Customer and Product Separately
Figure 3 for Towards Personalized Review Summarization by Modeling Historical Reviews from Customer and Product Separately
Figure 4 for Towards Personalized Review Summarization by Modeling Historical Reviews from Customer and Product Separately

Review summarization is a non-trivial task that aims to summarize the main idea of the product review in the E-commerce website. Different from the document summary which only needs to focus on the main facts described in the document, review summarization should not only summarize the main aspects mentioned in the review but also reflect the personal style of the review author. Although existing review summarization methods have incorporated the historical reviews of both customer and product, they usually simply concatenate and indiscriminately model this two heterogeneous information into a long sequence. Moreover, the rating information can also provide a high-level abstraction of customer preference, it has not been used by the majority of methods. In this paper, we propose the Heterogeneous Historical Review aware Review Summarization Model (HHRRS) which separately models the two types of historical reviews with the rating information by a graph reasoning module with a contrastive loss. We employ a multi-task framework that conducts the review sentiment classification and summarization jointly. Extensive experiments on four benchmark datasets demonstrate the superiority of HHRRS on both tasks.

Viaarxiv icon

Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken Conversations

Mar 08, 2022
Ruijie Yan, Shuang Peng, Haitao Mi, Liang Jiang, Shihui Yang, Yuchi Zhang, Jiajun Li, Liangrui Peng, Yongliang Wang, Zujie Wen

Figure 1 for Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken Conversations
Figure 2 for Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken Conversations
Figure 3 for Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken Conversations
Figure 4 for Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken Conversations

Building robust and general dialogue models for spoken conversations is challenging due to the gap in distributions of spoken and written data. This paper presents our approach to build generalized models for the Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations Challenge of DSTC-10. In order to mitigate the discrepancies between spoken and written text, we mainly employ extensive data augmentation strategies on written data, including artificial error injection and round-trip text-speech transformation. To train robust models for spoken conversations, we improve pre-trained language models, and apply ensemble algorithms for each sub-task. Typically, for the detection task, we fine-tune \roberta and ELECTRA, and run an error-fixing ensemble algorithm. For the selection task, we adopt a two-stage framework that consists of entity tracking and knowledge ranking, and propose a multi-task learning method to learn multi-level semantic information by domain classification and entity selection. For the generation task, we adopt a cross-validation data process to improve pre-trained generative language models, followed by a consensus decoding algorithm, which can add arbitrary features like relative \rouge metric, and tune associated feature weights toward \bleu directly. Our approach ranks third on the objective evaluation and second on the final official human evaluation.

Viaarxiv icon

Precognition in Task-oriented Dialogue Understanding: Posterior Regularization by Future Context

Mar 07, 2022
Nan Su, Yuchi Zhang, Chao Liu, Bingzhu Du, Yongliang Wang

Figure 1 for Precognition in Task-oriented Dialogue Understanding: Posterior Regularization by Future Context
Figure 2 for Precognition in Task-oriented Dialogue Understanding: Posterior Regularization by Future Context
Figure 3 for Precognition in Task-oriented Dialogue Understanding: Posterior Regularization by Future Context
Figure 4 for Precognition in Task-oriented Dialogue Understanding: Posterior Regularization by Future Context

Task-oriented dialogue systems have become overwhelmingly popular in recent researches. Dialogue understanding is widely used to comprehend users' intent, emotion and dialogue state in task-oriented dialogue systems. Most previous works on such discriminative tasks only models current query or historical conversations. Even if in some work the entire dialogue flow was modeled, it is not suitable for the real-world task-oriented conversations as the future contexts are not visible in such cases. In this paper, we propose to jointly model historical and future information through the posterior regularization method. More specifically, by modeling the current utterance and past contexts as prior, and the entire dialogue flow as posterior, we optimize the KL distance between these distributions to regularize our model during training. And only historical information is used for inference. Extensive experiments on two dialogue datasets validate the effectiveness of our proposed method, achieving superior results compared with all baseline models.

Viaarxiv icon

HeteroQA: Learning towards Question-and-Answering through Multiple Information Sources via Heterogeneous Graph Modeling

Dec 27, 2021
Shen Gao, Yuchi Zhang, Yongliang Wang, Yang Dong, Xiuying Chen, Dongyan Zhao, Rui Yan

Figure 1 for HeteroQA: Learning towards Question-and-Answering through Multiple Information Sources via Heterogeneous Graph Modeling
Figure 2 for HeteroQA: Learning towards Question-and-Answering through Multiple Information Sources via Heterogeneous Graph Modeling
Figure 3 for HeteroQA: Learning towards Question-and-Answering through Multiple Information Sources via Heterogeneous Graph Modeling
Figure 4 for HeteroQA: Learning towards Question-and-Answering through Multiple Information Sources via Heterogeneous Graph Modeling

Community Question Answering (CQA) is a well-defined task that can be used in many scenarios, such as E-Commerce and online user community for special interests. In these communities, users can post articles, give comment, raise a question and answer it. These data form the heterogeneous information sources where each information source have their own special structure and context (comments attached to an article or related question with answers). Most of the CQA methods only incorporate articles or Wikipedia to extract knowledge and answer the user's question. However, various types of information sources in the community are not fully explored by these CQA methods and these multiple information sources (MIS) can provide more related knowledge to user's questions. Thus, we propose a question-aware heterogeneous graph transformer to incorporate the MIS in the user community to automatically generate the answer. To evaluate our proposed method, we conduct the experiments on two datasets: $\text{MSM}^{\text{plus}}$ the modified version of benchmark dataset MS-MARCO and the AntQA dataset which is the first large-scale CQA dataset with four types of MIS. Extensive experiments on two datasets show that our model outperforms all the baselines in terms of all the metrics.

* Accepted by WSDM 2022; Code at https://github.com/gsh199449/HeteroQA 
Viaarxiv icon

AudioViewer: Learning to Visualize Sound

Dec 28, 2020
Yuchi Zhang, Willis Peng, Bastian Wandt, Helge Rhodin

Figure 1 for AudioViewer: Learning to Visualize Sound
Figure 2 for AudioViewer: Learning to Visualize Sound
Figure 3 for AudioViewer: Learning to Visualize Sound
Figure 4 for AudioViewer: Learning to Visualize Sound

Sensory substitution can help persons with perceptual deficits. In this work, we attempt to visualize audio with video. Our long-term goal is to create sound perception for hearing impaired people, for instance, to facilitate feedback for training deaf speech. Different from existing models that translate between speech and text or text and images, we target an immediate and low-level translation that applies to generic environment sounds and human speech without delay. No canonical mapping is known for this artificial translation task. Our design is to translate from audio to video by compressing both into a common latent space with shared structure. Our core contribution is the development and evaluation of learned mappings that respect human perception limits and maximize user comfort by enforcing priors and combining strategies from unpaired image translation and disentanglement. We demonstrate qualitatively and quantitatively that our AudioViewer model maintains important audio features in the generated video and that generated videos of faces and numbers are well suited for visualizing high-dimensional audio features since they can easily be parsed by humans to match and distinguish between sounds, words, and speakers.

Viaarxiv icon

Improve Diverse Text Generation by Self Labeling Conditional Variational Auto Encoder

Mar 26, 2019
Yuchi Zhang, Yongliang Wang, Liping Zhang, Zhiqiang Zhang, Kun Gai

Figure 1 for Improve Diverse Text Generation by Self Labeling Conditional Variational Auto Encoder
Figure 2 for Improve Diverse Text Generation by Self Labeling Conditional Variational Auto Encoder
Figure 3 for Improve Diverse Text Generation by Self Labeling Conditional Variational Auto Encoder
Figure 4 for Improve Diverse Text Generation by Self Labeling Conditional Variational Auto Encoder

Diversity plays a vital role in many text generating applications. In recent years, Conditional Variational Auto Encoders (CVAE) have shown promising performances for this task. However, they often encounter the so called KL-Vanishing problem. Previous works mitigated such problem by heuristic methods such as strengthening the encoder or weakening the decoder while optimizing the CVAE objective function. Nevertheless, the optimizing direction of these methods are implicit and it is hard to find an appropriate degree to which these methods should be applied. In this paper, we propose an explicit optimizing objective to complement the CVAE to directly pull away from KL-vanishing. In fact, this objective term guides the encoder towards the "best encoder" of the decoder to enhance the expressiveness. A labeling network is introduced to estimate the "best encoder". It provides a continuous label in the latent space of CVAE to help build a close connection between latent variables and targets. The whole proposed method is named Self Labeling CVAE~(SLCVAE). To accelerate the research of diverse text generation, we also propose a large native one-to-many dataset. Extensive experiments are conducted on two tasks, which show that our method largely improves the generating diversity while achieving comparable accuracy compared with state-of-art algorithms.

* Accepted as a conference paper in ICASSP 2019. But this copy is an extended version of the submitted manuscript. With more theoretical analysis and human evaluation 
Viaarxiv icon