Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Wu

HUB: Guiding Learned Optimizers with Continuous Prompt Tuning

May 31, 2023

Gaole Dai, Wei Wu, Ziyu Wang, Jie Fu, Shanghang Zhang, Tiejun Huang

Abstract:Learned optimizers are a crucial component of meta-learning. Recent advancements in scalable learned optimizers have demonstrated their superior performance over hand-designed optimizers in various tasks. However, certain characteristics of these models, such as an unstable learning curve, limited ability to handle unseen tasks and network architectures, difficult-to-control behaviours, and poor performance in fine-tuning tasks impede their widespread adoption. To tackle the issue of generalization in scalable learned optimizers, we propose a hybrid-update-based (HUB) optimization strategy inspired by recent advancements in hard prompt tuning and result selection techniques used in large language and vision models. This approach can be easily applied to any task that involves hand-designed or learned optimizer. By incorporating hand-designed optimizers as the second component in our hybrid approach, we are able to retain the benefits of learned optimizers while stabilizing the training process and, more importantly, improving testing performance. We validate our design through a total of 17 tasks, consisting of thirteen training from scratch and four fine-tuning settings. These tasks vary in model sizes, architectures, or dataset sizes, and the competing optimizers are hyperparameter-tuned. We outperform all competitors in 94% of the tasks with better testing performance. Furthermore, we conduct a theoretical analysis to examine the potential impact of our hybrid strategy on the behaviours and inherited traits of learned optimizers.

* Some table information is not accurate, author information not correct inside the pdf

Via

Access Paper or Ask Questions

PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models

May 30, 2023

Zhuocheng Gong, Jiahao Liu, Qifan Wang, Yang Yang, Jingang Wang, Wei Wu, Yunsen Xian, Dongyan Zhao, Rui Yan

Figure 1 for PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models

Figure 2 for PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models

Figure 3 for PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models

Figure 4 for PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models

Abstract:While transformer-based pre-trained language models (PLMs) have dominated a number of NLP applications, these models are heavy to deploy and expensive to use. Therefore, effectively compressing large-scale PLMs becomes an increasingly important problem. Quantization, which represents high-precision tensors with low-bit fix-point format, is a viable solution. However, most existing quantization methods are task-specific, requiring customized training and quantization with a large number of trainable parameters on each individual task. Inspired by the observation that the over-parameterization nature of PLMs makes it possible to freeze most of the parameters during the fine-tuning stage, in this work, we propose a novel ``quantize before fine-tuning'' framework, PreQuant, that differs from both quantization-aware training and post-training quantization. PreQuant is compatible with various quantization strategies, with outlier-aware parameter-efficient fine-tuning incorporated to correct the induced quantization error. We demonstrate the effectiveness of PreQuant on the GLUE benchmark using BERT, RoBERTa, and T5. We also provide an empirical investigation into the workflow of PreQuant, which sheds light on its efficacy.

* Findings of ACL2023

Via

Access Paper or Ask Questions

RankCSE: Unsupervised Sentence Representations Learning via Learning to Rank

May 26, 2023

Jiduan Liu, Jiahao Liu, Qifan Wang, Jingang Wang, Wei Wu, Yunsen Xian, Dongyan Zhao, Kai Chen, Rui Yan

Abstract:Unsupervised sentence representation learning is one of the fundamental problems in natural language processing with various downstream applications. Recently, contrastive learning has been widely adopted which derives high-quality sentence representations by pulling similar semantics closer and pushing dissimilar ones away. However, these methods fail to capture the fine-grained ranking information among the sentences, where each sentence is only treated as either positive or negative. In many real-world scenarios, one needs to distinguish and rank the sentences based on their similarities to a query sentence, e.g., very relevant, moderate relevant, less relevant, irrelevant, etc. In this paper, we propose a novel approach, RankCSE, for unsupervised sentence representation learning, which incorporates ranking consistency and ranking distillation with contrastive learning into a unified framework. In particular, we learn semantically discriminative sentence representations by simultaneously ensuring ranking consistency between two representations with different dropout masks, and distilling listwise ranking knowledge from the teacher. An extensive set of experiments are conducted on both semantic textual similarity (STS) and transfer (TR) tasks. Experimental results demonstrate the superior performance of our approach over several state-of-the-art baselines.

* ACL 2023

Via

Access Paper or Ask Questions

MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos

Apr 19, 2023

Zicheng Zhang, Wei Wu, Wei Sun, Dangyang Tu, Wei Lu, Xiongkuo Min, Ying Chen, Guangtao Zhai

Figure 1 for MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos

Figure 2 for MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos

Figure 3 for MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos

Figure 4 for MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos

Abstract:User-generated content (UGC) live videos are often bothered by various distortions during capture procedures and thus exhibit diverse visual qualities. Such source videos are further compressed and transcoded by media server providers before being distributed to end-users. Because of the flourishing of UGC live videos, effective video quality assessment (VQA) tools are needed to monitor and perceptually optimize live streaming videos in the distributing process. In this paper, we address \textbf{UGC Live VQA} problems by constructing a first-of-a-kind subjective UGC Live VQA database and developing an effective evaluation tool. Concretely, 418 source UGC videos are collected in real live streaming scenarios and 3,762 compressed ones at different bit rates are generated for the subsequent subjective VQA experiments. Based on the built database, we develop a \underline{M}ulti-\underline{D}imensional \underline{VQA} (\textbf{MD-VQA}) evaluator to measure the visual quality of UGC live videos from semantic, distortion, and motion aspects respectively. Extensive experimental results show that MD-VQA achieves state-of-the-art performance on both our UGC Live VQA database and existing compressed UGC VQA databases.

* Accepted to CVPR2023

Via

Access Paper or Ask Questions

M2GNN: Metapath and Multi-interest Aggregated Graph Neural Network for Tag-based Cross-domain Recommendation

Apr 16, 2023

Zepeng Huai, Yuji Yang, Mengdi Zhang, Zhongyi Zhang, Yichun Li, Wei Wu

Figure 1 for M2GNN: Metapath and Multi-interest Aggregated Graph Neural Network for Tag-based Cross-domain Recommendation

Figure 2 for M2GNN: Metapath and Multi-interest Aggregated Graph Neural Network for Tag-based Cross-domain Recommendation

Figure 3 for M2GNN: Metapath and Multi-interest Aggregated Graph Neural Network for Tag-based Cross-domain Recommendation

Figure 4 for M2GNN: Metapath and Multi-interest Aggregated Graph Neural Network for Tag-based Cross-domain Recommendation

Abstract:Cross-domain recommendation (CDR) is an effective way to alleviate the data sparsity problem. Content-based CDR is one of the most promising branches since most kinds of products can be described by a piece of text, especially when cold-start users or items have few interactions. However, two vital issues are still under-explored: (1) From the content modeling perspective, sufficient long-text descriptions are usually scarce in a real recommender system, more often the light-weight textual features, such as a few keywords or tags, are more accessible, which is improperly modeled by existing methods. (2) From the CDR perspective, not all inter-domain interests are helpful to infer intra-domain interests. Caused by domain-specific features, there are part of signals benefiting for recommendation in the source domain but harmful for that in the target domain. Therefore, how to distill useful interests is crucial. To tackle the above two problems, we propose a metapath and multi-interest aggregated graph neural network (M2GNN). Specifically, to model the tag-based contents, we construct a heterogeneous information network to hold the semantic relatedness between users, items, and tags in all domains. The metapath schema is predefined according to domain-specific knowledge, with one metapath for one domain. User representations are learned by GNN with a hierarchical aggregation framework, where the intra-metapath aggregation firstly filters out trivial tags and the inter-metapath aggregation further filters out useless interests. Offline experiments and online A/B tests demonstrate that M2GNN achieves significant improvements over the state-of-the-art methods and current industrial recommender system in Dianping, respectively. Further analysis shows that M2GNN offers an interpretable recommendation.

* Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)

Via

Access Paper or Ask Questions

Multi-task Transformer with Relation-attention and Type-attention for Named Entity Recognition

Mar 20, 2023

Ying Mo, Hongyin Tang, Jiahao Liu, Qifan Wang, Zenglin Xu, Jingang Wang, Wei Wu, Zhoujun Li

Abstract:Named entity recognition (NER) is an important research problem in natural language processing. There are three types of NER tasks, including flat, nested and discontinuous entity recognition. Most previous sequential labeling models are task-specific, while recent years have witnessed the rising of generative models due to the advantage of unifying all NER tasks into the seq2seq model framework. Although achieving promising performance, our pilot studies demonstrate that existing generative models are ineffective at detecting entity boundaries and estimating entity types. This paper proposes a multi-task Transformer, which incorporates an entity boundary detection task into the named entity recognition task. More concretely, we achieve entity boundary detection by classifying the relations between tokens within the sentence. To improve the accuracy of entity-type mapping during decoding, we adopt an external knowledge base to calculate the prior entity-type distributions and then incorporate the information into the model via the self and cross-attention mechanisms. We perform experiments on an extensive set of NER benchmarks, including two flat, three nested, and three discontinuous NER datasets. Experimental results show that our approach considerably improves the generative NER model's performance.

* 5 pages,accepted ICASSP 2023

Via

Access Paper or Ask Questions

Dual Path Modeling for Semantic Matching by Perceiving Subtle Conflicts

Mar 14, 2023

Chao Xue, Di Liang, Sirui Wang, Wei Wu, Jing Zhang

Figure 1 for Dual Path Modeling for Semantic Matching by Perceiving Subtle Conflicts

Figure 2 for Dual Path Modeling for Semantic Matching by Perceiving Subtle Conflicts

Figure 3 for Dual Path Modeling for Semantic Matching by Perceiving Subtle Conflicts

Figure 4 for Dual Path Modeling for Semantic Matching by Perceiving Subtle Conflicts

Abstract:Transformer-based pre-trained models have achieved great improvements in semantic matching. However, existing models still suffer from insufficient ability to capture subtle differences. The modification, addition and deletion of words in sentence pairs may make it difficult for the model to predict their relationship. To alleviate this problem, we propose a novel Dual Path Modeling Framework to enhance the model's ability to perceive subtle differences in sentence pairs by separately modeling affinity and difference semantics. Based on dual-path modeling framework we design the Dual Path Modeling Network (DPM-Net) to recognize semantic relations. And we conduct extensive experiments on 10 well-studied semantic matching and robustness test datasets, and the experimental results show that our proposed method achieves consistent improvements over baselines.

* ICASSP 2023
* ICASSP 2023. arXiv admin note: text overlap with arXiv:2210.03454

Via

Access Paper or Ask Questions

Time-aware Multiway Adaptive Fusion Network for Temporal Knowledge Graph Question Answering

Mar 14, 2023

Yonghao Liu, Di Liang, Fang Fang, Sirui Wang, Wei Wu, Rui Jiang

Figure 1 for Time-aware Multiway Adaptive Fusion Network for Temporal Knowledge Graph Question Answering

Figure 2 for Time-aware Multiway Adaptive Fusion Network for Temporal Knowledge Graph Question Answering

Figure 3 for Time-aware Multiway Adaptive Fusion Network for Temporal Knowledge Graph Question Answering

Figure 4 for Time-aware Multiway Adaptive Fusion Network for Temporal Knowledge Graph Question Answering

Abstract:Knowledge graphs (KGs) have received increasing attention due to its wide applications on natural language processing. However, its use case on temporal question answering (QA) has not been well-explored. Most of existing methods are developed based on pre-trained language models, which might not be capable to learn \emph{temporal-specific} presentations of entities in terms of temporal KGQA task. To alleviate this problem, we propose a novel \textbf{T}ime-aware \textbf{M}ultiway \textbf{A}daptive (\textbf{TMA}) fusion network. Inspired by the step-by-step reasoning behavior of humans. For each given question, TMA first extracts the relevant concepts from the KG, and then feeds them into a multiway adaptive module to produce a \emph{temporal-specific} representation of the question. This representation can be incorporated with the pre-trained KG embedding to generate the final prediction. Empirical results verify that the proposed model achieves better performance than the state-of-the-art models in the benchmark dataset. Notably, the Hits@1 and Hits@10 results of TMA on the CronQuestions dataset's complex questions are absolutely improved by 24\% and 10\% compared to the best-performing baseline. Furthermore, we also show that TMA employing an adaptive fusion mechanism can provide interpretability by analyzing the proportion of information in question representations.

* ICASSP 2023
* ICASSP 2023

Via

Access Paper or Ask Questions

Meta-Learning Triplet Network with Adaptive Margins for Few-Shot Named Entity Recognition

Feb 14, 2023

Chengcheng Han, Renyu Zhu, Jun Kuang, FengJiao Chen, Xiang Li, Ming Gao, Xuezhi Cao, Wei Wu

Figure 1 for Meta-Learning Triplet Network with Adaptive Margins for Few-Shot Named Entity Recognition

Figure 2 for Meta-Learning Triplet Network with Adaptive Margins for Few-Shot Named Entity Recognition

Figure 3 for Meta-Learning Triplet Network with Adaptive Margins for Few-Shot Named Entity Recognition

Figure 4 for Meta-Learning Triplet Network with Adaptive Margins for Few-Shot Named Entity Recognition

Abstract:Meta-learning methods have been widely used in few-shot named entity recognition (NER), especially prototype-based methods. However, the Other(O) class is difficult to be represented by a prototype vector because there are generally a large number of samples in the class that have miscellaneous semantics. To solve the problem, we propose MeTNet, which generates prototype vectors for entity types only but not O-class. We design an improved triplet network to map samples and prototype vectors into a low-dimensional space that is easier to be classified and propose an adaptive margin for each entity type. The margin plays as a radius and controls a region with adaptive size in the low-dimensional space. Based on the regions, we propose a new inference procedure to predict the label of a query instance. We conduct extensive experiments in both in-domain and cross-domain settings to show the superiority of MeTNet over other state-of-the-art methods. In particular, we release a Chinese few-shot NER dataset FEW-COMM extracted from a well-known e-commerce platform. To the best of our knowledge, this is the first Chinese few-shot NER dataset. All the datasets and codes are provided at https://github.com/hccngu/MeTNet.

Via

Access Paper or Ask Questions

FFHR: Fully and Flexible Hyperbolic Representation for Knowledge Graph Completion

Feb 07, 2023

Wentao Shi, Junkang Wu, Xuezhi Cao, Jiawei Chen, Wenqiang Lei, Wei Wu, Xiangnan He

Abstract:Learning hyperbolic embeddings for knowledge graph (KG) has gained increasing attention due to its superiority in capturing hierarchies. However, some important operations in hyperbolic space still lack good definitions, making existing methods unable to fully leverage the merits of hyperbolic space. Specifically, they suffer from two main limitations: 1) existing Graph Convolutional Network (GCN) methods in hyperbolic space rely on tangent space approximation, which would incur approximation error in representation learning, and 2) due to the lack of inner product operation definition in hyperbolic space, existing methods can only measure the plausibility of facts (links) with hyperbolic distance, which is difficult to capture complex data patterns. In this work, we contribute: 1) a Full Poincar\'{e} Multi-relational GCN that achieves graph information propagation in hyperbolic space without requiring any approximation, and 2) a hyperbolic generalization of Euclidean inner product that is beneficial to capture both hierarchical and complex patterns. On this basis, we further develop a \textbf{F}ully and \textbf{F}lexible \textbf{H}yperbolic \textbf{R}epresentation framework (\textbf{FFHR}) that is able to transfer recent Euclidean-based advances to hyperbolic space. We demonstrate it by instantiating FFHR with four representative KGC methods. Extensive experiments on benchmark datasets validate the superiority of our FFHRs over their Euclidean counterparts as well as state-of-the-art hyperbolic embedding methods.

* submit to TKDE

Via

Access Paper or Ask Questions