Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Weijie Liu

Whitening Sentence Representations for Better Semantics and Faster Retrieval

Mar 29, 2021

Jianlin Su, Jiarun Cao, Weijie Liu, Yangyiwen Ou

Figure 1 for Whitening Sentence Representations for Better Semantics and Faster Retrieval

Figure 2 for Whitening Sentence Representations for Better Semantics and Faster Retrieval

Figure 3 for Whitening Sentence Representations for Better Semantics and Faster Retrieval

Abstract:Pre-training models such as BERT have achieved great success in many natural language processing tasks. However, how to obtain better sentence representation through these pre-training models is still worthy to exploit. Previous work has shown that the anisotropy problem is an critical bottleneck for BERT-based sentence representation which hinders the model to fully utilize the underlying semantic features. Therefore, some attempts of boosting the isotropy of sentence distribution, such as flow-based model, have been applied to sentence representations and achieved some improvement. In this paper, we find that the whitening operation in traditional machine learning can similarly enhance the isotropy of sentence representations and achieve competitive results. Furthermore, the whitening technique is also capable of reducing the dimensionality of the sentence representation. Our experimental results show that it can not only achieve promising performance but also significantly reduce the storage cost and accelerate the model retrieval speed.

* The source code of this paper is available at https://github.com/bojone/BERT-whitening

Via

Access Paper or Ask Questions

Partial Gromov-Wasserstein Learning for Partial Graph Matching

Dec 09, 2020

Weijie Liu, Chao Zhang, Jiahao Xie, Zebang Shen, Hui Qian, Nenggan Zheng

Figure 1 for Partial Gromov-Wasserstein Learning for Partial Graph Matching

Figure 2 for Partial Gromov-Wasserstein Learning for Partial Graph Matching

Figure 3 for Partial Gromov-Wasserstein Learning for Partial Graph Matching

Figure 4 for Partial Gromov-Wasserstein Learning for Partial Graph Matching

Abstract:Graph matching finds the correspondence of nodes across two graphs and is a basic task in graph-based machine learning. Numerous existing methods match every node in one graph to one node in the other graph whereas two graphs usually overlap partially in many \realworld{} applications. In this paper, a partial Gromov-Wasserstein learning framework is proposed for partially matching two graphs, which fuses the partial Gromov-Wasserstein distance and the partial Wasserstein distance as the objective and updates the partial transport map and the node embedding in an alternating fashion. The proposed framework transports a fraction of the probability mass and matches node pairs with high relative similarities across the two graphs. Incorporating an embedding learning method, heterogeneous graphs can also be matched. Numerical experiments on both synthetic and \realworld{} graphs demonstrate that our framework can improve the F1 score by at least $20\%$ and often much more.

Via

Access Paper or Ask Questions

BiTT: Bidirectional Tree Tagging for Joint Extraction of Overlapping Entities and Relations

Sep 07, 2020

Xukun Luo, Weijie Liu, Meng Ma, Ping Wang

Figure 1 for BiTT: Bidirectional Tree Tagging for Joint Extraction of Overlapping Entities and Relations

Figure 2 for BiTT: Bidirectional Tree Tagging for Joint Extraction of Overlapping Entities and Relations

Figure 3 for BiTT: Bidirectional Tree Tagging for Joint Extraction of Overlapping Entities and Relations

Figure 4 for BiTT: Bidirectional Tree Tagging for Joint Extraction of Overlapping Entities and Relations

Abstract:Joint extraction refers to extracting triples, composed of entities and relations, simultaneously from the text with a single model. However, most existing methods fail to extract all triples accurately and efficiently from sentences with overlapping issue, i.e., the same entity is included in multiple triples. In this paper, we propose a novel scheme called Bidirectional Tree Tagging (BiTT) to label overlapping triples in text. In BiTT, the triples with the same relation category in a sentence are especially represented as two binary trees, each of which is converted into a word-level tags sequence to label each word. Based on BiTT scheme, we develop an end-to-end extraction framework to predict the BiTT tags and further extract triples efficiently. We adopt the Bi-LSTM and the BERT as the encoder in our framework respectively, and obtain promising results in public English as well as Chinese datasets.

* 15 pages, 5 figures

Via

Access Paper or Ask Questions

FastBERT: a Self-distilling BERT with Adaptive Inference Time

Apr 29, 2020

Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, Qi Ju

Figure 1 for FastBERT: a Self-distilling BERT with Adaptive Inference Time

Figure 2 for FastBERT: a Self-distilling BERT with Adaptive Inference Time

Figure 3 for FastBERT: a Self-distilling BERT with Adaptive Inference Time

Figure 4 for FastBERT: a Self-distilling BERT with Adaptive Inference Time

Abstract:Pre-trained language models like BERT have proven to be highly performant. However, they are often computationally expensive in many practical scenarios, for such heavy models can hardly be readily implemented with limited resources. To improve their efficiency with an assured model performance, we propose a novel speed-tunable FastBERT with adaptive inference time. The speed at inference can be flexibly adjusted under varying demands, while redundant calculation of samples is avoided. Moreover, this model adopts a unique self-distillation mechanism at fine-tuning, further enabling a greater computational efficacy with minimal loss in performance. Our model achieves promising results in twelve English and Chinese datasets. It is able to speed up by a wide range from 1 to 12 times than BERT if given different speedup thresholds to make a speed-performance tradeoff.

* This manuscript has been accepted to appear at ACL 2020

Via

Access Paper or Ask Questions

A Decentralized Proximal Point-type Method for Saddle Point Problems

Oct 31, 2019

Weijie Liu, Aryan Mokhtari, Asuman Ozdaglar, Sarath Pattathil, Zebang Shen, Nenggan Zheng

Figure 1 for A Decentralized Proximal Point-type Method for Saddle Point Problems

Figure 2 for A Decentralized Proximal Point-type Method for Saddle Point Problems

Figure 3 for A Decentralized Proximal Point-type Method for Saddle Point Problems

Abstract:In this paper, we focus on solving a class of constrained non-convex non-concave saddle point problems in a decentralized manner by a group of nodes in a network. Specifically, we assume that each node has access to a summand of a global objective function and nodes are allowed to exchange information only with their neighboring nodes. We propose a decentralized variant of the proximal point method for solving this problem. We show that when the objective function is $\rho$-weakly convex-weakly concave the iterates converge to approximate stationarity with a rate of $\mathcal{O}(1/\sqrt{T})$ where the approximation error depends linearly on $\sqrt{\rho}$. We further show that when the objective function satisfies the Minty VI condition (which generalizes the convex-concave case) we obtain convergence to stationarity with a rate of $\mathcal{O}(1/\sqrt{T})$. To the best of our knowledge, our proposed method is the first decentralized algorithm with theoretical guarantees for solving a non-convex non-concave decentralized saddle point problem. Our numerical results for training a general adversarial network (GAN) in a decentralized manner match our theoretical guarantees.

* 18 pages

Via

Access Paper or Ask Questions

K-BERT: Enabling Language Representation with Knowledge Graph

Sep 17, 2019

Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, Ping Wang

Figure 1 for K-BERT: Enabling Language Representation with Knowledge Graph

Figure 2 for K-BERT: Enabling Language Representation with Knowledge Graph

Figure 3 for K-BERT: Enabling Language Representation with Knowledge Graph

Figure 4 for K-BERT: Enabling Language Representation with Knowledge Graph

Abstract:Pre-trained language representation models, such as BERT, capture a general language representation from large-scale corpora, but lack domain-specific knowledge. When reading a domain text, experts make inferences with relevant knowledge. For machines to achieve this capability, we propose a knowledge-enabled language representation model (K-BERT) with knowledge graphs (KGs), in which triples are injected into the sentences as domain knowledge. However, too much knowledge incorporation may divert the sentence from its correct meaning, which is called knowledge noise (KN) issue. To overcome KN, K-BERT introduces soft-position and visible matrix to limit the impact of knowledge. K-BERT can easily inject domain knowledge into the models by equipped with a KG without pre-training by-self because it is capable of loading model parameters from the pre-trained BERT. Our investigation reveals promising results in twelve NLP tasks. Especially in domain-specific tasks (including finance, law, and medicine), K-BERT significantly outperforms BERT, which demonstrates that K-BERT is an excellent choice for solving the knowledge-driven problems that require experts.

* 8 pages, 20190917

Via

Access Paper or Ask Questions