Alert button
Picture for Xinyu Zhu

Xinyu Zhu

Alert button

AutoConv: Automatically Generating Information-seeking Conversations with Large Language Models

Aug 12, 2023
Siheng Li, Cheng Yang, Yichun Yin, Xinyu Zhu, Zesen Cheng, Lifeng Shang, Xin Jiang, Qun Liu, Yujiu Yang

Figure 1 for AutoConv: Automatically Generating Information-seeking Conversations with Large Language Models
Figure 2 for AutoConv: Automatically Generating Information-seeking Conversations with Large Language Models
Figure 3 for AutoConv: Automatically Generating Information-seeking Conversations with Large Language Models
Figure 4 for AutoConv: Automatically Generating Information-seeking Conversations with Large Language Models

Information-seeking conversation, which aims to help users gather information through conversation, has achieved great progress in recent years. However, the research is still stymied by the scarcity of training data. To alleviate this problem, we propose AutoConv for synthetic conversation generation, which takes advantage of the few-shot learning ability and generation capacity of large language models (LLM). Specifically, we formulate the conversation generation problem as a language modeling task, then finetune an LLM with a few human conversations to capture the characteristics of the information-seeking process and use it for generating synthetic conversations with high quality. Experimental results on two frequently-used datasets verify that AutoConv has substantial improvements over strong baselines and alleviates the dependence on human annotation. In addition, we also provide several analysis studies to promote future research.

* Accepted to ACL 2023 Main Conference (Short) 
Viaarxiv icon

Question Answering as Programming for Solving Time-Sensitive Questions

May 23, 2023
Xinyu Zhu, Cheng Yang, Bei Chen, Siheng Li, Jian-Guang Lou, Yujiu Yang

Figure 1 for Question Answering as Programming for Solving Time-Sensitive Questions
Figure 2 for Question Answering as Programming for Solving Time-Sensitive Questions
Figure 3 for Question Answering as Programming for Solving Time-Sensitive Questions
Figure 4 for Question Answering as Programming for Solving Time-Sensitive Questions

In this work we try to apply Large Language Models (LLMs) to reframe the Question Answering task as Programming (QAaP). Due to the inherent dynamic nature of the real world, factual questions frequently involve a symbolic constraint: time, solving these questions necessitates not only extensive world knowledge, but also advanced reasoning ability to satisfy the temporal constraints. Despite the remarkable intelligence exhibited by LLMs in various NLP tasks, our experiments reveal that the aforementioned problems continue to pose a significant challenge to existing LLMs. To solve these time-sensitive factual questions, considering that modern LLMs possess superior ability in both natural language understanding and programming,we endeavor to leverage LLMs to represent diversely expressed text as well-structured code, and thereby grasp the desired knowledge along with the underlying symbolic constraint.

Viaarxiv icon

NER-to-MRC: Named-Entity Recognition Completely Solving as Machine Reading Comprehension

May 06, 2023
Yuxiang Zhang, Junjie Wang, Xinyu Zhu, Tetsuya Sakai, Hayato Yamana

Figure 1 for NER-to-MRC: Named-Entity Recognition Completely Solving as Machine Reading Comprehension
Figure 2 for NER-to-MRC: Named-Entity Recognition Completely Solving as Machine Reading Comprehension
Figure 3 for NER-to-MRC: Named-Entity Recognition Completely Solving as Machine Reading Comprehension
Figure 4 for NER-to-MRC: Named-Entity Recognition Completely Solving as Machine Reading Comprehension

Named-entity recognition (NER) detects texts with predefined semantic labels and is an essential building block for natural language processing (NLP). Notably, recent NER research focuses on utilizing massive extra data, including pre-training corpora and incorporating search engines. However, these methods suffer from high costs associated with data collection and pre-training, and additional training process of the retrieved data from search engines. To address the above challenges, we completely frame NER as a machine reading comprehension (MRC) problem, called NER-to-MRC, by leveraging MRC with its ability to exploit existing data efficiently. Several prior works have been dedicated to employing MRC-based solutions for tackling the NER problem, several challenges persist: i) the reliance on manually designed prompts; ii) the limited MRC approaches to data reconstruction, which fails to achieve performance on par with methods utilizing extensive additional data. Thus, our NER-to-MRC conversion consists of two components: i) transform the NER task into a form suitable for the model to solve with MRC in a efficient manner; ii) apply the MRC reasoning strategy to the model. We experiment on 6 benchmark datasets from three domains and achieve state-of-the-art performance without external data, up to 11.24% improvement on the WNUT-16 dataset.

Viaarxiv icon

Solving Math Word Problem via Cooperative Reasoning induced Language Models

Oct 28, 2022
Xinyu Zhu, Junjie Wang, Lin Zhang, Yuxiang Zhang, Ruyi Gan, Jiaxing Zhang, Yujiu Yang

Figure 1 for Solving Math Word Problem via Cooperative Reasoning induced Language Models
Figure 2 for Solving Math Word Problem via Cooperative Reasoning induced Language Models
Figure 3 for Solving Math Word Problem via Cooperative Reasoning induced Language Models
Figure 4 for Solving Math Word Problem via Cooperative Reasoning induced Language Models

Large-scale pre-trained language models (PLMs) bring new opportunities to challenge problems, especially those that need high-level intelligence, such as the math word problem (MWPs). However, directly applying existing PLMs to MWPs can fail as the generation process lacks sufficient supervision and thus lacks fast adaptivity as humans. We notice that human reasoning has a dual reasoning framework that consists of an immediate reaction system (system 1) and a delicate reasoning system (system 2), where the entire reasoning is determined by their interaction. This inspires us to develop a cooperative reasoning-induced PLM for solving MWPs, called Cooperative Reasoning (CoRe), resulting in a human-like reasoning architecture with system 1 as the generator and system 2 as the verifier. In our approach, the generator is responsible for generating reasoning paths, and the verifiers are used to supervise the evaluation in order to obtain reliable feedback for the generator. We evaluate our CoRe framework on several mathematical reasoning datasets and achieve decent improvement over state-of-the-art methods, up to 9.8% increase over best baselines.

Viaarxiv icon

Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective

Oct 18, 2022
Ping Yang, Junjie Wang, Ruyi Gan, Xinyu Zhu, Lin Zhang, Ziwei Wu, Xinyu Gao, Jiaxing Zhang, Tetsuya Sakai

Figure 1 for Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective
Figure 2 for Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective
Figure 3 for Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective
Figure 4 for Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective

We propose a new paradigm for zero-shot learners that is format agnostic, i.e., it is compatible with any format and applicable to a list of language tasks, such as text classification, commonsense reasoning, coreference resolution, and sentiment analysis. Zero-shot learning aims to train a model on a given task such that it can address new learning tasks without any additional training. Our approach converts zero-shot learning into multiple-choice tasks, avoiding problems in commonly used large-scale generative models such as FLAN. It not only adds generalization ability to models but also significantly reduces the number of parameters. Our method shares the merits of efficient training and deployment. Our approach shows state-of-the-art performance on several benchmarks and produces satisfactory results on tasks such as natural language inference and text classification. Our model achieves this success with only 235M parameters, which is substantially smaller than state-of-the-art models with billions of parameters. The code and pre-trained models are available at https://github.com/IDEA-CCNL/Fengshenbang-LM .

* EMNLP 2022 
Viaarxiv icon

Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence

Sep 07, 2022
Junjie Wang, Yuxiang Zhang, Lin Zhang, Ping Yang, Xinyu Gao, Ziwei Wu, Xiaoqun Dong, Junqing He, Jianheng Zhuo, Qi Yang, Yongfeng Huang, Xiayu Li, Yanghan Wu, Junyu Lu, Xinyu Zhu, Weifeng Chen, Ting Han, Kunhao Pan, Rui Wang, Hao Wang, Xiaojun Wu, Zhongshen Zeng, Chongpei Chen, Ruyi Gan, Jiaxing Zhang

Figure 1 for Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence
Figure 2 for Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence
Figure 3 for Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence
Figure 4 for Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence

Nowadays, foundation models become one of fundamental infrastructures in artificial intelligence, paving ways to the general intelligence. However, the reality presents two urgent challenges: existing foundation models are dominated by the English-language community; users are often given limited resources and thus cannot always use foundation models. To support the development of the Chinese-language community, we introduce an open-source project, called Fengshenbang, which leads by the research center for Cognitive Computing and Natural Language (CCNL). Our project has comprehensive capabilities, including large pre-trained models, user-friendly APIs, benchmarks, datasets, and others. We wrap all these in three sub-projects: the Fengshenbang Model, the Fengshen Framework, and the Fengshen Benchmark. An open-source roadmap, Fengshenbang, aims to re-evaluate the open-source community of Chinese pre-trained large-scale models, prompting the development of the entire Chinese large-scale model community. We also want to build a user-centered open-source ecosystem to allow individuals to access the desired models to match their computing resources. Furthermore, we invite companies, colleges, and research institutions to collaborate with us to build the large-scale open-source model-based ecosystem. We hope that this project will be the foundation of Chinese cognitive intelligence.

Viaarxiv icon

Molecular Substructure-Aware Network for Drug-Drug Interaction Prediction

Aug 25, 2022
Xinyu Zhu, Yongliang Shen, Weiming Lu

Figure 1 for Molecular Substructure-Aware Network for Drug-Drug Interaction Prediction
Figure 2 for Molecular Substructure-Aware Network for Drug-Drug Interaction Prediction
Figure 3 for Molecular Substructure-Aware Network for Drug-Drug Interaction Prediction
Figure 4 for Molecular Substructure-Aware Network for Drug-Drug Interaction Prediction

Concomitant administration of drugs can cause drug-drug interactions (DDIs). Some drug combinations are beneficial, but other ones may cause negative effects which are previously unrecorded. Previous works on DDI prediction usually rely on hand-engineered domain knowledge, which is laborious to obtain. In this work, we propose a novel model, Molecular Substructure-Aware Network (MSAN), to effectively predict potential DDIs from molecular structures of drug pairs. We adopt a Transformer-like substructure extraction module to acquire a fixed number of representative vectors that are associated with various substructure patterns of the drug molecule. Then, interaction strength between the two drugs' substructures will be captured by a similarity-based interaction module. We also perform a substructure dropping augmentation before graph encoding to alleviate overfitting. Experimental results from a real-world dataset reveal that our proposed model achieves the state-of-the-art performance. We also show that the predictions of our model are highly interpretable through a case study.

* Accepted to CIKM 2022 (Short), camera ready version 
Viaarxiv icon

REAPS: Towards Better Recognition of Fine-grained Images by Region Attending and Part Sequencing

Aug 06, 2019
Peng Zhang, Xinyu Zhu, Zhanzhan Cheng, Shuigeng Zhou, Yi Niu

Figure 1 for REAPS: Towards Better Recognition of Fine-grained Images by Region Attending and Part Sequencing
Figure 2 for REAPS: Towards Better Recognition of Fine-grained Images by Region Attending and Part Sequencing
Figure 3 for REAPS: Towards Better Recognition of Fine-grained Images by Region Attending and Part Sequencing
Figure 4 for REAPS: Towards Better Recognition of Fine-grained Images by Region Attending and Part Sequencing

Fine-grained image recognition has been a hot research topic in computer vision due to its various applications. The-state-of-the-art is the part/region-based approaches that first localize discriminative parts/regions, and then learn their fine-grained features. However, these approaches have some inherent drawbacks: 1) the discriminative feature representation of an object is prone to be disturbed by complicated background; 2) it is unreasonable and inflexible to fix the number of salient parts, because the intended parts may be unavailable under certain circumstances due to occlusion or incompleteness, and 3) the spatial correlation among different salient parts has not been thoroughly exploited (if not completely neglected). To overcome these drawbacks, in this paper we propose a new, simple yet robust method by building part sequence model on the attended object region. Concretely, we first try to alleviate the background effect by using a region attention mechanism to generate the attended region from the original image. Then, instead of localizing different salient parts and extracting their features separately, we learn the part representation implicitly by applying a mapping function on the serialized features of the object. Finally, we combine the region attending network and the part sequence learning network into a unified framework that can be trained end-to-end with only image-level labels. Our extensive experiments on three fine-grained benchmarks show that the proposed method achieves the state of the art performance.

* PRCV 2019 
Viaarxiv icon