Alert button
Picture for Yihong Chen

Yihong Chen

Alert button

TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents

Aug 07, 2023
Jingqing Ruan, Yihong Chen, Bin Zhang, Zhiwei Xu, Tianpeng Bao, Guoqing Du, Shiwei Shi, Hangyu Mao, Xingyu Zeng, Rui Zhao

Figure 1 for TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents
Figure 2 for TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents
Figure 3 for TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents
Figure 4 for TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents

With recent advancements in natural language processing, Large Language Models (LLMs) have emerged as powerful tools for various real-world applications. Despite their prowess, the intrinsic generative abilities of LLMs may prove insufficient for handling complex tasks which necessitate a combination of task planning and the usage of external tools. In this paper, we first propose a structured framework tailored for LLM-based AI Agents and discuss the crucial capabilities necessary for tackling intricate problems. Within this framework, we design two distinct types of agents (i.e., one-step agent and sequential agent) to execute the inference process. Subsequently, we instantiate the framework using various LLMs and evaluate their Task Planning and Tool Usage (TPTU) abilities on typical tasks. By highlighting key findings and challenges, our goal is to provide a helpful resource for researchers and practitioners to leverage the power of LLMs in their AI applications. Our study emphasizes the substantial potential of these models, while also identifying areas that need more investigation and improvement.

Viaarxiv icon

Improving Language Plasticity via Pretraining with Active Forgetting

Jul 04, 2023
Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, Mikel Artetxe

Figure 1 for Improving Language Plasticity via Pretraining with Active Forgetting
Figure 2 for Improving Language Plasticity via Pretraining with Active Forgetting
Figure 3 for Improving Language Plasticity via Pretraining with Active Forgetting
Figure 4 for Improving Language Plasticity via Pretraining with Active Forgetting

Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performance, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data and compute inefficient. We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Concretely, by resetting the embedding layer every K updates during pretraining, we encourage the PLM to improve its ability of learning new embeddings within a limited number of updates, similar to a meta-learning effect. Experiments with RoBERTa show that models pretrained with our forgetting mechanism not only demonstrate faster convergence during language adaptation but also outperform standard ones in a low-data regime, particularly for languages that are distant from English.

Viaarxiv icon

Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

Dec 20, 2022
Kelly Marchisio, Patrick Lewis, Yihong Chen, Mikel Artetxe

Figure 1 for Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training
Figure 2 for Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training
Figure 3 for Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training
Figure 4 for Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

Prior work has shown that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen. Despite learning a small subset of parameters, this approach is not compute-efficient, as training the new embeddings requires a full forward and backward pass over the entire model. In this work, we propose mini-model adaptation, a compute-efficient alternative that builds a shallow mini-model from a fraction of a large model's parameters. New language-specific embeddings can then be efficiently trained over the mini-model, and plugged into the aligned large model for rapid cross-lingual transfer. We explore two approaches to learn mini-models: MiniJoint, which jointly pretrains the primary model and the mini-model using a single transformer with a secondary MLM head at a middle layer; and MiniPost, where we start from a regular pretrained model and build a mini-model by extracting and freezing a few layers and learning a small number of parameters on top. Experiments on XNLI, MLQA and PAWS-X show that mini-model adaptation matches the performance of the standard approach using up to 2.4x less compute.

Viaarxiv icon

Check and Link: Pairwise Lesion Correspondence Guides Mammogram Mass Detection

Sep 13, 2022
Ziwei Zhao, Dong Wang, Yihong Chen, Ziteng Wang, Liwei Wang

Figure 1 for Check and Link: Pairwise Lesion Correspondence Guides Mammogram Mass Detection
Figure 2 for Check and Link: Pairwise Lesion Correspondence Guides Mammogram Mass Detection
Figure 3 for Check and Link: Pairwise Lesion Correspondence Guides Mammogram Mass Detection
Figure 4 for Check and Link: Pairwise Lesion Correspondence Guides Mammogram Mass Detection

Detecting mass in mammogram is significant due to the high occurrence and mortality of breast cancer. In mammogram mass detection, modeling pairwise lesion correspondence explicitly is particularly important. However, most of the existing methods build relatively coarse correspondence and have not utilized correspondence supervision. In this paper, we propose a new transformer-based framework CL-Net to learn lesion detection and pairwise correspondence in an end-to-end manner. In CL-Net, View-Interactive Lesion Detector is proposed to achieve dynamic interaction across candidates of cross views, while Lesion Linker employs the correspondence supervision to guide the interaction process more accurately. The combination of these two designs accomplishes precise understanding of pairwise lesion correspondence for mammograms. Experiments show that CL-Net yields state-of-the-art performance on the public DDSM dataset and our in-house dataset. Moreover, it outperforms previous methods by a large margin in low FPI regime.

* Accepted by ECCV 2022 
Viaarxiv icon

PointScatter: Point Set Representation for Tubular Structure Extraction

Sep 13, 2022
Dong Wang, Zhao Zhang, Ziwei Zhao, Yuhang Liu, Yihong Chen, Liwei Wang

Figure 1 for PointScatter: Point Set Representation for Tubular Structure Extraction
Figure 2 for PointScatter: Point Set Representation for Tubular Structure Extraction
Figure 3 for PointScatter: Point Set Representation for Tubular Structure Extraction
Figure 4 for PointScatter: Point Set Representation for Tubular Structure Extraction

This paper explores the point set representation for tubular structure extraction tasks. Compared with the traditional mask representation, the point set representation enjoys its flexibility and representation ability, which would not be restricted by the fixed grid as the mask. Inspired by this, we propose PointScatter, an alternative to the segmentation models for the tubular structure extraction task. PointScatter splits the image into scatter regions and parallelly predicts points for each scatter region. We further propose the greedy-based region-wise bipartite matching algorithm to train the network end-to-end and efficiently. We benchmark the PointScatter on four public tubular datasets, and the extensive experiments on tubular structure segmentation and centerline extraction task demonstrate the effectiveness of our approach. Code is available at https://github.com/zhangzhao2022/pointscatter.

* ECCV2022 (Oral) 
Viaarxiv icon

Boosting 3D Object Detection via Object-Focused Image Fusion

Jul 21, 2022
Hao Yang, Chen Shi, Yihong Chen, Liwei Wang

Figure 1 for Boosting 3D Object Detection via Object-Focused Image Fusion
Figure 2 for Boosting 3D Object Detection via Object-Focused Image Fusion
Figure 3 for Boosting 3D Object Detection via Object-Focused Image Fusion
Figure 4 for Boosting 3D Object Detection via Object-Focused Image Fusion

3D object detection has achieved remarkable progress by taking point clouds as the only input. However, point clouds often suffer from incomplete geometric structures and the lack of semantic information, which makes detectors hard to accurately classify detected objects. In this work, we focus on how to effectively utilize object-level information from images to boost the performance of point-based 3D detector. We present DeMF, a simple yet effective method to fuse image information into point features. Given a set of point features and image feature maps, DeMF adaptively aggregates image features by taking the projected 2D location of the 3D point as reference. We evaluate our method on the challenging SUN RGB-D dataset, improving state-of-the-art results by a large margin (+2.1 mAP@0.25 and +2.3mAP@0.5). Code is available at https://github.com/haoy945/DeMF.

Viaarxiv icon

ReFactorGNNs: Revisiting Factorisation-based Models from a Message-Passing Perspective

Jul 21, 2022
Yihong Chen, Pushkar Mishra, Luca Franceschi, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel

Figure 1 for ReFactorGNNs: Revisiting Factorisation-based Models from a Message-Passing Perspective
Figure 2 for ReFactorGNNs: Revisiting Factorisation-based Models from a Message-Passing Perspective
Figure 3 for ReFactorGNNs: Revisiting Factorisation-based Models from a Message-Passing Perspective
Figure 4 for ReFactorGNNs: Revisiting Factorisation-based Models from a Message-Passing Perspective

Factorisation-based Models (FMs), such as DistMult, have enjoyed enduring success for Knowledge Graph Completion (KGC) tasks, often outperforming Graph Neural Networks (GNNs). However, unlike GNNs, FMs struggle to incorporate node features and to generalise to unseen nodes in inductive settings. Our work bridges the gap between FMs and GNNs by proposing ReFactorGNNs. This new architecture draws upon both modelling paradigms, which previously were largely thought of as disjoint. Concretely, using a message-passing formalism, we show how FMs can be cast as GNNs by reformulating the gradient descent procedure as message-passing operations, which forms the basis of our ReFactorGNNs. Across a multitude of well-established KGC benchmarks, our ReFactorGNNs achieve comparable transductive performance to FMs, and state-of-the-art inductive performance while using an order of magnitude fewer parameters.

Viaarxiv icon

Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations

Oct 06, 2021
Yihong Chen, Pasquale Minervini, Sebastian Riedel, Pontus Stenetorp

Figure 1 for Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations
Figure 2 for Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations
Figure 3 for Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations
Figure 4 for Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations

Learning good representations on multi-relational graphs is essential to knowledge base completion (KBC). In this paper, we propose a new self-supervised training objective for multi-relational graph representation learning, via simply incorporating relation prediction into the commonly used 1vsAll objective. The new training objective contains not only terms for predicting the subject and object of a given triple, but also a term for predicting the relation type. We analyse how this new objective impacts multi-relational learning in KBC: experiments on a variety of datasets and models show that relation prediction can significantly improve entity ranking, the most widely used evaluation task for KBC, yielding a 6.1% increase in MRR and 9.9% increase in Hits@1 on FB15k-237 as well as a 3.1% increase in MRR and 3.4% in Hits@1 on Aristo-v4. Moreover, we observe that the proposed objective is especially effective on highly multi-relational datasets, i.e. datasets with a large number of predicates, and generates better representations when larger embedding sizes are used.

* AKBC 2021 
Viaarxiv icon

Learnable Embedding Sizes for Recommender Systems

Jan 19, 2021
Siyi Liu, Chen Gao, Yihong Chen, Depeng Jin, Yong Li

Figure 1 for Learnable Embedding Sizes for Recommender Systems
Figure 2 for Learnable Embedding Sizes for Recommender Systems
Figure 3 for Learnable Embedding Sizes for Recommender Systems
Figure 4 for Learnable Embedding Sizes for Recommender Systems

The embedding-based representation learning is commonly used in deep learning recommendation models to map the raw sparse features to dense vectors. The traditional embedding manner that assigns a uniform size to all features has two issues. First, the numerous features inevitably lead to a gigantic embedding table that causes a high memory usage cost. Second, it is likely to cause the over-fitting problem for those features that do not require too large representation capacity. Existing works that try to address the problem always cause a significant drop in recommendation performance or suffers from the limitation of unaffordable training time cost. In this paper, we proposed a novel approach, named PEP (short for Plug-in Embedding Pruning), to reduce the size of the embedding table while obviating a drop in accuracy and computational optimization. PEP prunes embedding parameter where the pruning threshold(s) can be adaptively learned from data. Therefore we can automatically obtain a mixed-dimension embedding-scheme by pruning redundant parameters for each feature. PEP is a general framework that can plug in various base recommendation models. Extensive experiments demonstrate it can efficiently cut down embedding parameters and boost the base model's performance. Specifically, it achieves strong recommendation performance while reducing 97-99% parameters. As for the computation cost, PEP only brings an additional 20-30% time cost compared with base models. Codes are available at https://github.com/ssui-liu/learnable-embed-sizes-for-RecSys.

* International Conference on Learning Representations (ICLR), 2021 
Viaarxiv icon

RepPoints V2: Verification Meets Regression for Object Detection

Jul 16, 2020
Yihong Chen, Zheng Zhang, Yue Cao, Liwei Wang, Stephen Lin, Han Hu

Figure 1 for RepPoints V2: Verification Meets Regression for Object Detection
Figure 2 for RepPoints V2: Verification Meets Regression for Object Detection
Figure 3 for RepPoints V2: Verification Meets Regression for Object Detection
Figure 4 for RepPoints V2: Verification Meets Regression for Object Detection

Verification and regression are two general methodologies for prediction in neural networks. Each has its own strengths: verification can be easier to infer accurately, and regression is more efficient and applicable to continuous target variables. Hence, it is often beneficial to carefully combine them to take advantage of their benefits. In this paper, we take this philosophy to improve state-of-the-art object detection, specifically by RepPoints. Though RepPoints provides high performance, we find that its heavy reliance on regression for object localization leaves room for improvement. We introduce verification tasks into the localization prediction of RepPoints, producing RepPoints v2, which provides consistent improvements of about 2.0 mAP over the original RepPoints on the COCO object detection benchmark using different backbones and training methods. RepPoints v2 also achieves 52.1 mAP on COCO \texttt{test-dev} by a single model. Moreover, we show that the proposed approach can more generally elevate other object detection frameworks as well as applications such as instance segmentation. The code is available at https://github.com/Scalsol/RepPointsV2.

Viaarxiv icon