In this paper, we pay attention to the issue which is usually overlooked, i.e., \textit{similarity should be determined from different perspectives}. To explore this issue, we release a Multi-Perspective Text Similarity (MPTS) dataset, in which sentence similarities are labeled from twelve perspectives. Furthermore, we conduct a series of experimental analysis on this task by retrofitting some famous text matching models. Finally, we obtain several conclusions and baseline models, laying the foundation for the following investigation of this issue. The dataset and code are publicly available at Github\footnote{\url{https://github.com/autoliuweijie/MPTS}
Graph matching finds the correspondence of nodes across two correlated graphs and lies at the core of many applications. When graph side information is not available, the node correspondence is estimated on the sole basis of network topologies. In this paper, we propose a novel criterion to measure the graph matching accuracy, structural inconsistency (SI), which is defined based on the network topological structure. Specifically, SI incorporates the heat diffusion wavelet to accommodate the multi-hop structure of the graphs. Based on SI, we propose a Structural Inconsistency reducing Graph Matching Algorithm (SIGMA), which improves the alignment scores of node pairs that have low SI values in each iteration. Under suitable assumptions, SIGMA can reduce SI values of true counterparts. Furthermore, we demonstrate that SIGMA can be derived by using a mirror descent method to solve the Gromov-Wasserstein distance with a novel K-hop-structure-based matching costs. Extensive experiments show that our method outperforms state-of-the-art methods.
Optimal transport (OT) naturally arises in a wide range of machine learning applications but may often become the computational bottleneck. Recently, one line of works propose to solve OT approximately by searching the \emph{transport plan} in a low-rank subspace. However, the optimal transport plan is often not low-rank, which tends to yield large approximation errors. For example, when Monge's \emph{transport map} exists, the transport plan is full rank. This paper concerns the computation of the OT distance with adequate accuracy and efficiency. A novel approximation for OT is proposed, in which the transport plan can be decomposed into the sum of a low-rank matrix and a sparse one. We theoretically analyze the approximation error. An augmented Lagrangian method is then designed to efficiently calculate the transport plan.
Expensive bounding-box annotations have limited the development of object detection task. Thus, it is necessary to focus on more challenging task of few-shot object detection. It requires the detector to recognize objects of novel classes with only a few training samples. Nowadays, many existing popular methods based on meta-learning have achieved promising performance, such as Meta R-CNN series. However, only a single category of support data is used as the attention to guide the detecting of query images each time. Their relevance to each other remains unexploited. Moreover, a lot of recent works treat the support data and query images as independent branch without considering the relationship between them. To address this issue, we propose a dynamic relevance learning model, which utilizes the relationship between all support images and Region of Interest (RoI) on the query images to construct a dynamic graph convolutional network (GCN). By adjusting the prediction distribution of the base detector using the output of this GCN, the proposed model can guide the detector to improve the class representation implicitly. Comprehensive experiments have been conducted on Pascal VOC and MS-COCO dataset. The proposed model achieves the best overall performance, which shows its effectiveness of learning more generalized features. Our code is available at https://github.com/liuweijie19980216/DRL-for-FSOD.
Pre-training models such as BERT have achieved great success in many natural language processing tasks. However, how to obtain better sentence representation through these pre-training models is still worthy to exploit. Previous work has shown that the anisotropy problem is an critical bottleneck for BERT-based sentence representation which hinders the model to fully utilize the underlying semantic features. Therefore, some attempts of boosting the isotropy of sentence distribution, such as flow-based model, have been applied to sentence representations and achieved some improvement. In this paper, we find that the whitening operation in traditional machine learning can similarly enhance the isotropy of sentence representations and achieve competitive results. Furthermore, the whitening technique is also capable of reducing the dimensionality of the sentence representation. Our experimental results show that it can not only achieve promising performance but also significantly reduce the storage cost and accelerate the model retrieval speed.
Graph matching finds the correspondence of nodes across two graphs and is a basic task in graph-based machine learning. Numerous existing methods match every node in one graph to one node in the other graph whereas two graphs usually overlap partially in many \realworld{} applications. In this paper, a partial Gromov-Wasserstein learning framework is proposed for partially matching two graphs, which fuses the partial Gromov-Wasserstein distance and the partial Wasserstein distance as the objective and updates the partial transport map and the node embedding in an alternating fashion. The proposed framework transports a fraction of the probability mass and matches node pairs with high relative similarities across the two graphs. Incorporating an embedding learning method, heterogeneous graphs can also be matched. Numerical experiments on both synthetic and \realworld{} graphs demonstrate that our framework can improve the F1 score by at least $20\%$ and often much more.
Joint extraction refers to extracting triples, composed of entities and relations, simultaneously from the text with a single model. However, most existing methods fail to extract all triples accurately and efficiently from sentences with overlapping issue, i.e., the same entity is included in multiple triples. In this paper, we propose a novel scheme called Bidirectional Tree Tagging (BiTT) to label overlapping triples in text. In BiTT, the triples with the same relation category in a sentence are especially represented as two binary trees, each of which is converted into a word-level tags sequence to label each word. Based on BiTT scheme, we develop an end-to-end extraction framework to predict the BiTT tags and further extract triples efficiently. We adopt the Bi-LSTM and the BERT as the encoder in our framework respectively, and obtain promising results in public English as well as Chinese datasets.
Joint extraction refers to extracting triples, composed of entities and relations, simultaneously from the text with a single model, but the existing methods rarely work well on sentences with overlapping issue, i.e., the same entity is included in multiple triples. In this paper, we propose a novel Bidirectional Tree Tagging (BiTT) scheme to label overlapping triples in the text. In a sentence, the triples with the same relation category are especially represented as two binary trees, each of which is converted into a word-level tags sequence to label each word. Based on our BiTT scheme, we develop an end-to-end classification framework to predict the BiTT tags. We adopt the Bi-LSTM layers and a pre-trained BERT encoder respectively as its encoder module, and obtain promising results in a public English dataset as well as a Chinese one. The source code is publicly available at https://anonymous/for/review.
Pre-trained language models like BERT have proven to be highly performant. However, they are often computationally expensive in many practical scenarios, for such heavy models can hardly be readily implemented with limited resources. To improve their efficiency with an assured model performance, we propose a novel speed-tunable FastBERT with adaptive inference time. The speed at inference can be flexibly adjusted under varying demands, while redundant calculation of samples is avoided. Moreover, this model adopts a unique self-distillation mechanism at fine-tuning, further enabling a greater computational efficacy with minimal loss in performance. Our model achieves promising results in twelve English and Chinese datasets. It is able to speed up by a wide range from 1 to 12 times than BERT if given different speedup thresholds to make a speed-performance tradeoff.