Alert button
Picture for Bing Yin

Bing Yin

Alert button

Situated Natural Language Explanations

Aug 27, 2023
Zining Zhu, Haoming Jiang, Jingfeng Yang, Sreyashi Nag, Chao Zhang, Jie Huang, Yifan Gao, Frank Rudzicz, Bing Yin

Figure 1 for Situated Natural Language Explanations
Figure 2 for Situated Natural Language Explanations
Figure 3 for Situated Natural Language Explanations
Figure 4 for Situated Natural Language Explanations

Natural language is among the most accessible tools for explaining decisions to humans, and large pretrained language models (PLMs) have demonstrated impressive abilities to generate coherent natural language explanations (NLE). The existing NLE research perspectives do not take the audience into account. An NLE can have high textual quality, but it might not accommodate audiences' needs and preference. To address this limitation, we propose an alternative perspective, situated NLE, including a situated generation framework and a situated evaluation framework. On the generation side, we propose simple prompt engineering methods that adapt the NLEs to situations. In human studies, the annotators preferred the situated NLEs. On the evaluation side, we set up automated evaluation scores in lexical, semantic, and pragmatic categories. The scores can be used to select the most suitable prompts to generate NLEs. Situated NLE provides a perspective to conduct further research on automatic NLE generations.

* A previous version was presented in ACL 2023 NLRSE workshop 
Viaarxiv icon

Exploring Part-Informed Visual-Language Learning for Person Re-Identification

Aug 04, 2023
Yin Lin, Cong Liu, Yehansen Chen, Jinshui Hu, Bing Yin, Baocai Yin, Zengfu Wang

Figure 1 for Exploring Part-Informed Visual-Language Learning for Person Re-Identification
Figure 2 for Exploring Part-Informed Visual-Language Learning for Person Re-Identification
Figure 3 for Exploring Part-Informed Visual-Language Learning for Person Re-Identification
Figure 4 for Exploring Part-Informed Visual-Language Learning for Person Re-Identification

Recently, visual-language learning has shown great potential in enhancing visual-based person re-identification (ReID). Existing visual-language learning-based ReID methods often focus on whole-body scale image-text feature alignment, while neglecting supervisions on fine-grained part features. This choice simplifies the learning process but cannot guarantee within-part feature semantic consistency thus hindering the final performance. Therefore, we propose to enhance fine-grained visual features with part-informed language supervision for ReID tasks. The proposed method, named Part-Informed Visual-language Learning ($\pi$-VL), suggests that (i) a human parsing-guided prompt tuning strategy and (ii) a hierarchical fusion-based visual-language alignment paradigm play essential roles in ensuring within-part feature semantic consistency. Specifically, we combine both identity labels and parsing maps to constitute pixel-level text prompts and fuse multi-stage visual features with a light-weight auxiliary head to perform fine-grained image-text alignment. As a plug-and-play and inference-free solution, our $\pi$-VL achieves substantial improvements over previous state-of-the-arts on four common-used ReID benchmarks, especially reporting 90.3% Rank-1 and 76.5% mAP for the most challenging MSMT17 database without bells and whistles.

* 11 pages, 5 figures 
Viaarxiv icon

Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for Recommendation and Text Generation

Jul 19, 2023
Wei Jin, Haitao Mao, Zheng Li, Haoming Jiang, Chen Luo, Hongzhi Wen, Haoyu Han, Hanqing Lu, Zhengyang Wang, Ruirui Li, Zhen Li, Monica Xiao Cheng, Rahul Goutam, Haiyang Zhang, Karthik Subbian, Suhang Wang, Yizhou Sun, Jiliang Tang, Bing Yin, Xianfeng Tang

Figure 1 for Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for Recommendation and Text Generation
Figure 2 for Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for Recommendation and Text Generation
Figure 3 for Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for Recommendation and Text Generation
Figure 4 for Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for Recommendation and Text Generation

Modeling customer shopping intentions is a crucial task for e-commerce, as it directly impacts user experience and engagement. Thus, accurately understanding customer preferences is essential for providing personalized recommendations. Session-based recommendation, which utilizes customer session data to predict their next interaction, has become increasingly popular. However, existing session datasets have limitations in terms of item attributes, user diversity, and dataset scale. As a result, they cannot comprehensively capture the spectrum of user behaviors and preferences. To bridge this gap, we present the Amazon Multilingual Multi-locale Shopping Session Dataset, namely Amazon-M2. It is the first multilingual dataset consisting of millions of user sessions from six different locales, where the major languages of products are English, German, Japanese, French, Italian, and Spanish. Remarkably, the dataset can help us enhance personalization and understanding of user preferences, which can benefit various existing tasks as well as enable new tasks. To test the potential of the dataset, we introduce three tasks in this work: (1) next-product recommendation, (2) next-product recommendation with domain shifts, and (3) next-product title generation. With the above tasks, we benchmark a range of algorithms on our proposed dataset, drawing new insights for further research and practice. In addition, based on the proposed dataset and tasks, we hosted a competition in the KDD CUP 2023 and have attracted thousands of users and submissions. The winning solutions and the associated workshop can be accessed at our website https://kddcup23.github.io/.

* Dataset for KDD Cup 2023, https://kddcup23.github.io/ 
Viaarxiv icon

Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels

Jul 05, 2023
Bang Yang, Fenglin Liu, Zheng Li, Qingyu Yin, Chenyu You, Bing Yin, Yuexian Zou

Figure 1 for Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Figure 2 for Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Figure 3 for Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Figure 4 for Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels

Generating an informative and attractive title for the product is a crucial task for e-commerce. Most existing works follow the standard multimodal natural language generation approaches, e.g., image captioning, and employ the large scale of human-labelled datasets to train desirable models. However, for novel products, especially in a different domain, there are few existing labelled data. In this paper, we propose a prompt-based approach, i.e., the Multimodal Prompt Learning framework, to accurately and efficiently generate titles for novel products with limited labels. We observe that the core challenges of novel product title generation are the understanding of novel product characteristics and the generation of titles in a novel writing style. To this end, we build a set of multimodal prompts from different modalities to preserve the corresponding characteristics and writing styles of novel products. As a result, with extremely limited labels for training, the proposed method can retrieve the multimodal prompts to generate desirable titles for novel products. The experiments and analyses are conducted on five novel product categories under both the in-domain and out-of-domain experimental settings. The results show that, with only 1% of downstream labelled data for training, our proposed approach achieves the best few-shot results and even achieves competitive results with fully-supervised methods trained on 100% of training data; With the full labelled data for training, our method achieves state-of-the-art results.

* accepted by ACL Findings 2023 
Viaarxiv icon

Weakly Supervised Scene Text Generation for Low-resource Languages

Jun 27, 2023
Yangchen Xie, Xinyuan Chen, Hongjian Zhan, Palaiahankote Shivakum, Bing Yin, Cong Liu, Yue Lu

Figure 1 for Weakly Supervised Scene Text Generation for Low-resource Languages
Figure 2 for Weakly Supervised Scene Text Generation for Low-resource Languages
Figure 3 for Weakly Supervised Scene Text Generation for Low-resource Languages
Figure 4 for Weakly Supervised Scene Text Generation for Low-resource Languages

A large number of annotated training images is crucial for training successful scene text recognition models. However, collecting sufficient datasets can be a labor-intensive and costly process, particularly for low-resource languages. To address this challenge, auto-generating text data has shown promise in alleviating the problem. Unfortunately, existing scene text generation methods typically rely on a large amount of paired data, which is difficult to obtain for low-resource languages. In this paper, we propose a novel weakly supervised scene text generation method that leverages a few recognition-level labels as weak supervision. The proposed method is able to generate a large amount of scene text images with diverse backgrounds and font styles through cross-language generation. Our method disentangles the content and style features of scene text images, with the former representing textual information and the latter representing characteristics such as font, alignment, and background. To preserve the complete content structure of generated images, we introduce an integrated attention module. Furthermore, to bridge the style gap in the style of different languages, we incorporate a pre-trained font classifier. We evaluate our method using state-of-the-art scene text recognition models. Experiments demonstrate that our generated scene text significantly improves the scene text recognition accuracy and help achieve higher accuracy when complemented with other generative methods.

Viaarxiv icon

A Unified Framework of Graph Information Bottleneck for Robustness and Membership Privacy

Jun 14, 2023
Enyan Dai, Limeng Cui, Zhengyang Wang, Xianfeng Tang, Yinghan Wang, Monica Cheng, Bing Yin, Suhang Wang

Figure 1 for A Unified Framework of Graph Information Bottleneck for Robustness and Membership Privacy
Figure 2 for A Unified Framework of Graph Information Bottleneck for Robustness and Membership Privacy
Figure 3 for A Unified Framework of Graph Information Bottleneck for Robustness and Membership Privacy
Figure 4 for A Unified Framework of Graph Information Bottleneck for Robustness and Membership Privacy

Graph Neural Networks (GNNs) have achieved great success in modeling graph-structured data. However, recent works show that GNNs are vulnerable to adversarial attacks which can fool the GNN model to make desired predictions of the attacker. In addition, training data of GNNs can be leaked under membership inference attacks. This largely hinders the adoption of GNNs in high-stake domains such as e-commerce, finance and bioinformatics. Though investigations have been made in conducting robust predictions and protecting membership privacy, they generally fail to simultaneously consider the robustness and membership privacy. Therefore, in this work, we study a novel problem of developing robust and membership privacy-preserving GNNs. Our analysis shows that Information Bottleneck (IB) can help filter out noisy information and regularize the predictions on labeled samples, which can benefit robustness and membership privacy. However, structural noises and lack of labels in node classification challenge the deployment of IB on graph-structured data. To mitigate these issues, we propose a novel graph information bottleneck framework that can alleviate structural noises with neighbor bottleneck. Pseudo labels are also incorporated in the optimization to minimize the gap between the predictions on the labeled set and unlabeled set for membership privacy. Extensive experiments on real-world datasets demonstrate that our method can give robust predictions and simultaneously preserve membership privacy.

Viaarxiv icon

Knowledge Graph Reasoning over Entities and Numerical Values

Jun 02, 2023
Jiaxin Bai, Chen Luo, Zheng Li, Qingyu Yin, Bing Yin, Yangqiu Song

Figure 1 for Knowledge Graph Reasoning over Entities and Numerical Values
Figure 2 for Knowledge Graph Reasoning over Entities and Numerical Values
Figure 3 for Knowledge Graph Reasoning over Entities and Numerical Values
Figure 4 for Knowledge Graph Reasoning over Entities and Numerical Values

A complex logic query in a knowledge graph refers to a query expressed in logic form that conveys a complex meaning, such as where did the Canadian Turing award winner graduate from? Knowledge graph reasoning-based applications, such as dialogue systems and interactive search engines, rely on the ability to answer complex logic queries as a fundamental task. In most knowledge graphs, edges are typically used to either describe the relationships between entities or their associated attribute values. An attribute value can be in categorical or numerical format, such as dates, years, sizes, etc. However, existing complex query answering (CQA) methods simply treat numerical values in the same way as they treat entities. This can lead to difficulties in answering certain queries, such as which Australian Pulitzer award winner is born before 1927, and which drug is a pain reliever and has fewer side effects than Paracetamol. In this work, inspired by the recent advances in numerical encoding and knowledge graph reasoning, we propose numerical complex query answering. In this task, we introduce new numerical variables and operations to describe queries involving numerical attribute values. To address the difference between entities and numerical values, we also propose the framework of Number Reasoning Network (NRN) for alternatively encoding entities and numerical values into separate encoding structures. During the numerical encoding process, NRN employs a parameterized density function to encode the distribution of numerical values. During the entity encoding process, NRN uses established query encoding methods for the original CQA problem. Experimental results show that NRN consistently improves various query encoding methods on three different knowledge graphs and achieves state-of-the-art results.

Viaarxiv icon

Graph Reasoning for Question Answering with Triplet Retrieval

May 30, 2023
Shiyang Li, Yifan Gao, Haoming Jiang, Qingyu Yin, Zheng Li, Xifeng Yan, Chao Zhang, Bing Yin

Figure 1 for Graph Reasoning for Question Answering with Triplet Retrieval
Figure 2 for Graph Reasoning for Question Answering with Triplet Retrieval
Figure 3 for Graph Reasoning for Question Answering with Triplet Retrieval
Figure 4 for Graph Reasoning for Question Answering with Triplet Retrieval

Answering complex questions often requires reasoning over knowledge graphs (KGs). State-of-the-art methods often utilize entities in questions to retrieve local subgraphs, which are then fed into KG encoder, e.g. graph neural networks (GNNs), to model their local structures and integrated into language models for question answering. However, this paradigm constrains retrieved knowledge in local subgraphs and discards more diverse triplets buried in KGs that are disconnected but useful for question answering. In this paper, we propose a simple yet effective method to first retrieve the most relevant triplets from KGs and then rerank them, which are then concatenated with questions to be fed into language models. Extensive results on both CommonsenseQA and OpenbookQA datasets show that our method can outperform state-of-the-art up to 4.6% absolute accuracy.

* Findings of ACL 2023 
Viaarxiv icon

CCGen: Explainable Complementary Concept Generation in E-Commerce

May 19, 2023
Jie Huang, Yifan Gao, Zheng Li, Jingfeng Yang, Yangqiu Song, Chao Zhang, Zining Zhu, Haoming Jiang, Kevin Chen-Chuan Chang, Bing Yin

Figure 1 for CCGen: Explainable Complementary Concept Generation in E-Commerce
Figure 2 for CCGen: Explainable Complementary Concept Generation in E-Commerce
Figure 3 for CCGen: Explainable Complementary Concept Generation in E-Commerce
Figure 4 for CCGen: Explainable Complementary Concept Generation in E-Commerce

We propose and study Complementary Concept Generation (CCGen): given a concept of interest, e.g., "Digital Cameras", generating a list of complementary concepts, e.g., 1) Camera Lenses 2) Batteries 3) Camera Cases 4) Memory Cards 5) Battery Chargers. CCGen is beneficial for various applications like query suggestion and item recommendation, especially in the e-commerce domain. To solve CCGen, we propose to train language models to generate ranked lists of concepts with a two-step training strategy. We also teach the models to generate explanations by incorporating explanations distilled from large teacher models. Extensive experiments and analysis demonstrate that our model can generate high-quality concepts complementary to the input concept while producing explanations to justify the predictions.

Viaarxiv icon

SCOTT: Self-Consistent Chain-of-Thought Distillation

May 03, 2023
Peifeng Wang, Zhengyang Wang, Zheng Li, Yifan Gao, Bing Yin, Xiang Ren

Figure 1 for SCOTT: Self-Consistent Chain-of-Thought Distillation
Figure 2 for SCOTT: Self-Consistent Chain-of-Thought Distillation
Figure 3 for SCOTT: Self-Consistent Chain-of-Thought Distillation
Figure 4 for SCOTT: Self-Consistent Chain-of-Thought Distillation

Large language models (LMs) beyond a certain scale, demonstrate the emergent capability of generating free-text rationales for their predictions via chain-of-thought (CoT) prompting. While CoT can yield dramatically improved performance, such gains are only observed for sufficiently large LMs. Even more concerning, there is little guarantee that the generated rationales are consistent with LM's predictions or faithfully justify the decisions. In this work, we propose a faithful knowledge distillation method to learn a small, self-consistent CoT model from a teacher model that is orders of magnitude larger. To form better supervision, we elicit rationales supporting the gold answers from a large LM (teacher) by contrastive decoding, which encourages the teacher to generate tokens that become more plausible only when the answer is considered. To ensure faithful distillation, we use the teacher-generated rationales to learn a student LM with a counterfactual reasoning objective, which prevents the student from ignoring the rationales to make inconsistent predictions. Experiments show that, while yielding comparable end-task performance, our method can generate CoT rationales that are more faithful than baselines do. Further analysis suggests that such a model respects the rationales more when making decisions; thus, we can improve its performance more by refining its rationales.

* 11 pages, 8 figures. Accepted to ACL 2023 
Viaarxiv icon