Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhiyuan Liu

Tsinghua University

ProAgent: From Robotic Process Automation to Agentic Process Automation

Nov 02, 2023

Yining Ye, Xin Cong, Shizuo Tian, Jiannan Cao, Hao Wang, Yujia Qin, Yaxi Lu, Heyang Yu, Huadong Wang, Yankai Lin(+2 more)

Figure 1 for ProAgent: From Robotic Process Automation to Agentic Process Automation

Figure 2 for ProAgent: From Robotic Process Automation to Agentic Process Automation

Figure 3 for ProAgent: From Robotic Process Automation to Agentic Process Automation

Figure 4 for ProAgent: From Robotic Process Automation to Agentic Process Automation

Abstract:From ancient water wheels to robotic process automation (RPA), automation technology has evolved throughout history to liberate human beings from arduous tasks. Yet, RPA struggles with tasks needing human-like intelligence, especially in elaborate design of workflow construction and dynamic decision-making in workflow execution. As Large Language Models (LLMs) have emerged human-like intelligence, this paper introduces Agentic Process Automation (APA), a groundbreaking automation paradigm using LLM-based agents for advanced automation by offloading the human labor to agents associated with construction and execution. We then instantiate ProAgent, an LLM-based agent designed to craft workflows from human instructions and make intricate decisions by coordinating specialized agents. Empirical experiments are conducted to detail its construction and execution procedure of workflow, showcasing the feasibility of APA, unveiling the possibility of a new paradigm of automation driven by agents. Our code is public at https://github.com/OpenBMB/ProAgent.

* Work in progress

Via

Access Paper or Ask Questions

Distributionally Robust Unsupervised Dense Retrieval Training on Web Graphs

Oct 26, 2023

Peixuan Han, Zhenghao Liu, Zhiyuan Liu, Chenyan Xiong

Abstract:This paper introduces Web-DRO, an unsupervised dense retrieval model, which clusters documents based on web structures and reweights the groups during contrastive training. Specifically, we first leverage web graph links and contrastively train an embedding model for clustering anchor-document pairs. Then we use Group Distributional Robust Optimization to reweight different clusters of anchor-document pairs, which guides the model to assign more weights to the group with higher contrastive loss and pay more attention to the worst case during training. Our experiments on MS MARCO and BEIR show that our model, Web-DRO, significantly improves the retrieval effectiveness in unsupervised scenarios. A comparison of clustering techniques shows that training on the web graph combining URL information reaches optimal performance on clustering. Further analysis confirms that group weights are stable and valid, indicating consistent model preferences as well as effective up-weighting of valuable groups and down-weighting of uninformative ones. The code of this paper can be obtained from https://github.com/OpenMatch/Web-DRO.

* 9 pages, 5 figures, 5 tables

Via

Access Paper or Ask Questions

MUSER: A Multi-View Similar Case Retrieval Dataset

Oct 24, 2023

Qingquan Li, Yiran Hu, Feng Yao, Chaojun Xiao, Zhiyuan Liu, Maosong Sun, Weixing Shen

Figure 1 for MUSER: A Multi-View Similar Case Retrieval Dataset

Figure 2 for MUSER: A Multi-View Similar Case Retrieval Dataset

Figure 3 for MUSER: A Multi-View Similar Case Retrieval Dataset

Figure 4 for MUSER: A Multi-View Similar Case Retrieval Dataset

Abstract:Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. However, existing SCR datasets only focus on the fact description section when judging the similarity between cases, ignoring other valuable sections (e.g., the court's opinion) that can provide insightful reasoning process behind. Furthermore, the case similarities are typically measured solely by the textual semantics of the fact descriptions, which may fail to capture the full complexity of legal cases from the perspective of legal knowledge. In this work, we present MUSER, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations. Specifically, we select three perspectives (legal fact, dispute focus, and law statutory) and build a comprehensive and structured label schema of legal elements for each of them, to enable accurate and knowledgeable evaluation of case similarities. The constructed dataset originates from Chinese civil cases and contains 100 query cases and 4,024 candidate cases. We implement several text classification algorithms for legal element prediction and various retrieval methods for retrieving similar cases on MUSER. The experimental results indicate that incorporating legal elements can benefit the performance of SCR models, but further efforts are still required to address the remaining challenges posed by MUSER. The source code and dataset are released at https://github.com/THUlawtech/MUSER.

* CIKM 2023
* Accepted by CIKM 2023 Resource Track

Via

Access Paper or Ask Questions

Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules

Oct 24, 2023

Chaojun Xiao, Yuqi Luo, Wenbin Zhang, Pengle Zhang, Xu Han, Yankai Lin, Zhengyan Zhang, Ruobing Xie, Zhiyuan Liu, Maosong Sun(+1 more)

Abstract:Pre-trained language models (PLMs) have achieved remarkable results on NLP tasks but at the expense of huge parameter sizes and the consequent computational costs. In this paper, we propose Variator, a parameter-efficient acceleration method that enhances computational efficiency through plug-and-play compression plugins. Compression plugins are designed to reduce the sequence length via compressing multiple hidden vectors into one and trained with original PLMs frozen. Different from traditional model acceleration methods, which compress PLMs to smaller sizes, Variator offers two distinct advantages: (1) In real-world applications, the plug-and-play nature of our compression plugins enables dynamic selection of different compression plugins with varying acceleration ratios based on the current workload. (2) The compression plugin comprises a few compact neural network layers with minimal parameters, significantly saving storage and memory overhead, particularly in scenarios with a growing number of tasks. We validate the effectiveness of Variator on seven datasets. Experimental results show that Variator can save 53% computational costs using only 0.9% additional parameters with a performance drop of less than 2%. Moreover, when the model scales to billions of parameters, Variator matches the strong performance of uncompressed PLMs.

* Accepted by Findings of EMNLP

Via

Access Paper or Ask Questions

Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules

Oct 23, 2023

Zhiyuan Liu, Yaorui Shi, An Zhang, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua

Figure 1 for Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules

Figure 2 for Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules

Figure 3 for Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules

Figure 4 for Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules

Abstract:Masked graph modeling excels in the self-supervised representation learning of molecular graphs. Scrutinizing previous studies, we can reveal a common scheme consisting of three key components: (1) graph tokenizer, which breaks a molecular graph into smaller fragments (i.e., subgraphs) and converts them into tokens; (2) graph masking, which corrupts the graph with masks; (3) graph autoencoder, which first applies an encoder on the masked graph to generate the representations, and then employs a decoder on the representations to recover the tokens of the original graph. However, the previous MGM studies focus extensively on graph masking and encoder, while there is limited understanding of tokenizer and decoder. To bridge the gap, we first summarize popular molecule tokenizers at the granularity of node, edge, motif, and Graph Neural Networks (GNNs), and then examine their roles as the MGM's reconstruction targets. Further, we explore the potential of adopting an expressive decoder in MGM. Our results show that a subgraph-level tokenizer and a sufficiently expressive decoder with remask decoding have a large impact on the encoder's representation learning. Finally, we propose a novel MGM method SimSGT, featuring a Simple GNN-based Tokenizer (SGT) and an effective decoding strategy. We empirically validate that our method outperforms the existing molecule self-supervised learning methods. Our codes and checkpoints are available at https://github.com/syr-cn/SimSGT.

* NeurIPS 2023. 10 pages

Via

Access Paper or Ask Questions

Unlock Multi-Modal Capability of Dense Retrieval via Visual Module Plugin

Oct 21, 2023

Tianshuo Zhou, Sen Mei, Xinze Li, Zhenghao Liu, Chenyan Xiong, Zhiyuan Liu, Yu Gu, Ge Yu

Figure 1 for Unlock Multi-Modal Capability of Dense Retrieval via Visual Module Plugin

Figure 2 for Unlock Multi-Modal Capability of Dense Retrieval via Visual Module Plugin

Figure 3 for Unlock Multi-Modal Capability of Dense Retrieval via Visual Module Plugin

Figure 4 for Unlock Multi-Modal Capability of Dense Retrieval via Visual Module Plugin

Abstract:This paper proposes Multi-modAl Retrieval model via Visual modulE pLugin (MARVEL) to learn an embedding space for queries and multi-modal documents to conduct retrieval. MARVEL encodes queries and multi-modal documents with a unified encoder model, which helps to alleviate the modality gap between images and texts. Specifically, we enable the image understanding ability of a well-trained dense retriever, T5-ANCE, by incorporating the image features encoded by the visual module as its inputs. To facilitate the multi-modal retrieval tasks, we build the ClueWeb22-MM dataset based on the ClueWeb22 dataset, which regards anchor texts as queries, and exact the related texts and image documents from anchor linked web pages. Our experiments show that MARVEL significantly outperforms the state-of-the-art methods on the multi-modal retrieval dataset WebQA and ClueWeb22-MM. Our further analyses show that the visual module plugin method is tailored to enable the image understanding ability for an existing dense retrieval model. Besides, we also show that the language model has the ability to extract image semantics from image encoders and adapt the image features in the input space of language models. All codes are available at https://github.com/OpenMatch/MARVEL.

Via

Access Paper or Ask Questions

ReLM: Leveraging Language Models for Enhanced Chemical Reaction Prediction

Oct 20, 2023

Yaorui Shi, An Zhang, Enzhi Zhang, Zhiyuan Liu, Xiang Wang

Figure 1 for ReLM: Leveraging Language Models for Enhanced Chemical Reaction Prediction

Figure 2 for ReLM: Leveraging Language Models for Enhanced Chemical Reaction Prediction

Figure 3 for ReLM: Leveraging Language Models for Enhanced Chemical Reaction Prediction

Figure 4 for ReLM: Leveraging Language Models for Enhanced Chemical Reaction Prediction

Abstract:Predicting chemical reactions, a fundamental challenge in chemistry, involves forecasting the resulting products from a given reaction process. Conventional techniques, notably those employing Graph Neural Networks (GNNs), are often limited by insufficient training data and their inability to utilize textual information, undermining their applicability in real-world applications. In this work, we propose ReLM, a novel framework that leverages the chemical knowledge encoded in language models (LMs) to assist GNNs, thereby enhancing the accuracy of real-world chemical reaction predictions. To further enhance the model's robustness and interpretability, we incorporate the confidence score strategy, enabling the LMs to self-assess the reliability of their predictions. Our experimental results demonstrate that ReLM improves the performance of state-of-the-art GNN-based methods across various chemical reaction datasets, especially in out-of-distribution settings. Codes are available at https://github.com/syr-cn/ReLM.

Via

Access Paper or Ask Questions

Thoroughly Modeling Multi-domain Pre-trained Recommendation as Language

Oct 20, 2023

Zekai Qu, Ruobing Xie, Chaojun Xiao, Yuan Yao, Zhiyuan Liu, Fengzong Lian, Zhanhui Kang, Jie Zhou

Figure 1 for Thoroughly Modeling Multi-domain Pre-trained Recommendation as Language

Figure 2 for Thoroughly Modeling Multi-domain Pre-trained Recommendation as Language

Figure 3 for Thoroughly Modeling Multi-domain Pre-trained Recommendation as Language

Figure 4 for Thoroughly Modeling Multi-domain Pre-trained Recommendation as Language

Abstract:With the thriving of pre-trained language model (PLM) widely verified in various of NLP tasks, pioneer efforts attempt to explore the possible cooperation of the general textual information in PLM with the personalized behavioral information in user historical behavior sequences to enhance sequential recommendation (SR). However, despite the commonalities of input format and task goal, there are huge gaps between the behavioral and textual information, which obstruct thoroughly modeling SR as language modeling via PLM. To bridge the gap, we propose a novel Unified pre-trained language model enhanced sequential recommendation (UPSR), aiming to build a unified pre-trained recommendation model for multi-domain recommendation tasks. We formally design five key indicators, namely naturalness, domain consistency, informativeness, noise & ambiguity, and text length, to guide the text->item adaptation and behavior sequence->text sequence adaptation differently for pre-training and fine-tuning stages, which are essential but under-explored by previous works. In experiments, we conduct extensive evaluations on seven datasets with both tuning and zero-shot settings and achieve the overall best performance. Comprehensive model analyses also provide valuable insights for behavior modeling via PLM, shedding light on large pre-trained recommendation models. The source codes will be released in the future.

Via

Access Paper or Ask Questions

Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models

Oct 19, 2023

Weize Chen, Xiaoyue Xu, Xu Han, Yankai Lin, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie Zhou

Figure 1 for Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models

Figure 2 for Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models

Figure 3 for Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models

Figure 4 for Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models

Abstract:Parameter-shared pre-trained language models (PLMs) have emerged as a successful approach in resource-constrained environments, enabling substantial reductions in model storage and memory costs without significant performance compromise. However, it is important to note that parameter sharing does not alleviate computational burdens associated with inference, thus impeding its practicality in situations characterized by limited stringent latency requirements or computational resources. Building upon neural ordinary differential equations (ODEs), we introduce a straightforward technique to enhance the inference efficiency of parameter-shared PLMs. Additionally, we propose a simple pre-training technique that leads to fully or partially shared models capable of achieving even greater inference acceleration. The experimental results demonstrate the effectiveness of our methods on both autoregressive and autoencoding PLMs, providing novel insights into more efficient utilization of parameter-shared models in resource-constrained settings.

* EMNLP 2023 Findings

Via

Access Paper or Ask Questions

MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter

Oct 19, 2023

Zhiyuan Liu, Sihang Li, Yanchen Luo, Hao Fei, Yixin Cao, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua

Figure 1 for MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter

Figure 2 for MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter

Figure 3 for MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter

Figure 4 for MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter

Abstract:Language Models (LMs) have demonstrated impressive molecule understanding ability on various 1D text-related tasks. However, they inherently lack 2D graph perception - a critical ability of human professionals in comprehending molecules' topological structures. To bridge this gap, we propose MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter. MolCA enables an LM (e.g., Galactica) to understand both text- and graph-based molecular contents via the cross-modal projector. Specifically, the cross-modal projector is implemented as a Q-Former to connect a graph encoder's representation space and an LM's text space. Further, MolCA employs a uni-modal adapter (i.e., LoRA) for the LM's efficient adaptation to downstream tasks. Unlike previous studies that couple an LM with a graph encoder via cross-modal contrastive learning, MolCA retains the LM's ability of open-ended text generation and augments it with 2D graph information. To showcase its effectiveness, we extensively benchmark MolCA on tasks of molecule captioning, IUPAC name prediction, and molecule-text retrieval, on which MolCA significantly outperforms the baselines. Our codes and checkpoints can be found at https://github.com/acharkq/MolCA.

* EMNLP main conference. 9 pages

Via

Access Paper or Ask Questions