Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiawei Han

MentorGNN: Deriving Curriculum for Pre-Training GNNs

Aug 21, 2022
Dawei Zhou, Lecheng Zheng, Dongqi Fu, Jiawei Han, Jingrui He

Figure 1 for MentorGNN: Deriving Curriculum for Pre-Training GNNs

Figure 2 for MentorGNN: Deriving Curriculum for Pre-Training GNNs

Figure 3 for MentorGNN: Deriving Curriculum for Pre-Training GNNs

Figure 4 for MentorGNN: Deriving Curriculum for Pre-Training GNNs

Graph pre-training strategies have been attracting a surge of attention in the graph mining community, due to their flexibility in parameterizing graph neural networks (GNNs) without any label information. The key idea lies in encoding valuable information into the backbone GNNs, by predicting the masked graph signals extracted from the input graphs. In order to balance the importance of diverse graph signals (e.g., nodes, edges, subgraphs), the existing approaches are mostly hand-engineered by introducing hyperparameters to re-weight the importance of graph signals. However, human interventions with sub-optimal hyperparameters often inject additional bias and deteriorate the generalization performance in the downstream applications. This paper addresses these limitations from a new perspective, i.e., deriving curriculum for pre-training GNNs. We propose an end-to-end model named MentorGNN that aims to supervise the pre-training process of GNNs across graphs with diverse structures and disparate feature spaces. To comprehend heterogeneous graph signals at different granularities, we propose a curriculum learning paradigm that automatically re-weighs graph signals in order to ensure a good generalization in the target domain. Moreover, we shed new light on the problem of domain adaption on relational data (i.e., graphs) by deriving a natural and interpretable upper bound on the generalization error of the pre-trained GNNs. Extensive experiments on a wealth of real graphs validate and verify the performance of MentorGNN.

* Accepted by CIKM 2022

Via

Access Paper or Ask Questions

Few-Shot Fine-Grained Entity Typing with Automatic Label Interpretation and Instance Generation

Jun 28, 2022
Jiaxin Huang, Yu Meng, Jiawei Han

Figure 1 for Few-Shot Fine-Grained Entity Typing with Automatic Label Interpretation and Instance Generation

Figure 2 for Few-Shot Fine-Grained Entity Typing with Automatic Label Interpretation and Instance Generation

Figure 3 for Few-Shot Fine-Grained Entity Typing with Automatic Label Interpretation and Instance Generation

Figure 4 for Few-Shot Fine-Grained Entity Typing with Automatic Label Interpretation and Instance Generation

We study the problem of few-shot Fine-grained Entity Typing (FET), where only a few annotated entity mentions with contexts are given for each entity type. Recently, prompt-based tuning has demonstrated superior performance to standard fine-tuning in few-shot scenarios by formulating the entity type classification task as a ''fill-in-the-blank'' problem. This allows effective utilization of the strong language modeling capability of Pre-trained Language Models (PLMs). Despite the success of current prompt-based tuning approaches, two major challenges remain: (1) the verbalizer in prompts is either manually designed or constructed from external knowledge bases, without considering the target corpus and label hierarchy information, and (2) current approaches mainly utilize the representation power of PLMs, but have not explored their generation power acquired through extensive general-domain pre-training. In this work, we propose a novel framework for few-shot FET consisting of two modules: (1) an entity type label interpretation module automatically learns to relate type labels to the vocabulary by jointly leveraging few-shot instances and the label hierarchy, and (2) a type-based contextualized instance generator produces new instances based on given instances to enlarge the training set for better generalization. On three benchmark datasets, our model outperforms existing methods by significant margins. Code can be found at https://github.com/teapot123/Fine-Grained-Entity-Typing.

* Accepted to KDD 2022 Research Track

Via

Access Paper or Ask Questions

TeKo: Text-Rich Graph Neural Networks with External Knowledge

Jun 15, 2022
Zhizhi Yu, Di Jin, Jianguo Wei, Ziyang Liu, Yue Shang, Yun Xiao, Jiawei Han, Lingfei Wu

Figure 1 for TeKo: Text-Rich Graph Neural Networks with External Knowledge

Figure 2 for TeKo: Text-Rich Graph Neural Networks with External Knowledge

Figure 3 for TeKo: Text-Rich Graph Neural Networks with External Knowledge

Figure 4 for TeKo: Text-Rich Graph Neural Networks with External Knowledge

Graph Neural Networks (GNNs) have gained great popularity in tackling various analytical tasks on graph-structured data (i.e., networks). Typical GNNs and their variants follow a message-passing manner that obtains network representations by the feature propagation process along network topology, which however ignore the rich textual semantics (e.g., local word-sequence) that exist in many real-world networks. Existing methods for text-rich networks integrate textual semantics by mainly utilizing internal information such as topics or phrases/words, which often suffer from an inability to comprehensively mine the text semantics, limiting the reciprocal guidance between network structure and text semantics. To address these problems, we propose a novel text-rich graph neural network with external knowledge (TeKo), in order to take full advantage of both structural and textual information within text-rich networks. Specifically, we first present a flexible heterogeneous semantic network that incorporates high-quality entities and interactions among documents and entities. We then introduce two types of external knowledge, that is, structured triplets and unstructured entity description, to gain a deeper insight into textual semantics. We further design a reciprocal convolutional mechanism for the constructed heterogeneous semantic network, enabling network structure and textual semantics to collaboratively enhance each other and learn high-level network representations. Extensive experimental results on four public text-rich networks as well as a large-scale e-commerce searching dataset illustrate the superior performance of TeKo over state-of-the-art baselines.

Via

Access Paper or Ask Questions

Unsupervised Key Event Detection from Massive Text Corpora

Jun 08, 2022
Yunyi Zhang, Fang Guo, Jiaming Shen, Jiawei Han

Figure 1 for Unsupervised Key Event Detection from Massive Text Corpora

Figure 2 for Unsupervised Key Event Detection from Massive Text Corpora

Figure 3 for Unsupervised Key Event Detection from Massive Text Corpora

Figure 4 for Unsupervised Key Event Detection from Massive Text Corpora

Automated event detection from news corpora is a crucial task towards mining fast-evolving structured knowledge. As real-world events have different granularities, from the top-level themes to key events and then to event mentions corresponding to concrete actions, there are generally two lines of research: (1) theme detection identifies from a news corpus major themes (e.g., "2019 Hong Kong Protests" vs. "2020 U.S. Presidential Election") that have very distinct semantics; and (2) action extraction extracts from one document mention-level actions (e.g., "the police hit the left arm of the protester") that are too fine-grained for comprehending the event. In this paper, we propose a new task, key event detection at the intermediate level, aiming to detect from a news corpus key events (e.g., "HK Airport Protest on Aug. 12-14"), each happening at a particular time/location and focusing on the same topic. This task can bridge event understanding and structuring and is inherently challenging because of the thematic and temporal closeness of key events and the scarcity of labeled data due to the fast-evolving nature of news articles. To address these challenges, we develop an unsupervised key event detection framework, EvMine, that (1) extracts temporally frequent peak phrases using a novel ttf-itf score, (2) merges peak phrases into event-indicative feature sets by detecting communities from our designed peak phrase graph that captures document co-occurrences, semantic similarities, and temporal closeness signals, and (3) iteratively retrieves documents related to each key event by training a classifier with automatically generated pseudo labels from the event-indicative feature sets and refining the detected key events using the retrieved documents. Extensive experiments and case studies show EvMine outperforms all the baseline methods and its ablations on two real-world news corpora.

* KDD 2022

Via

Access Paper or Ask Questions

Schema-Guided Event Graph Completion

Jun 06, 2022
Hongwei Wang, Zixuan Zhang, Sha Li, Jiawei Han, Yizhou Sun, Hanghang Tong, Joseph P. Olive, Heng Ji

Figure 1 for Schema-Guided Event Graph Completion

Figure 2 for Schema-Guided Event Graph Completion

Figure 3 for Schema-Guided Event Graph Completion

Figure 4 for Schema-Guided Event Graph Completion

We tackle a new task, event graph completion, which aims to predict missing event nodes for event graphs. Existing link prediction or graph completion methods have difficulty dealing with event graphs because they are usually designed for a single large graph such as a social network or a knowledge graph, rather than multiple small dynamic event graphs. Moreover, they can only predict missing edges rather than missing nodes. In this work, we propose to utilize event schema, a template that describes the stereotypical structure of event graphs, to address the above issues. Our schema-guided event graph completion approach first maps an instance event graph to a subgraph of the schema graph by a heuristic subgraph matching algorithm. Then it predicts whether a candidate event node in the schema graph should be added to the instantiated schema subgraph by characterizing two types of local topology of the schema graph: neighbors of the candidate node and the subgraph, and paths that connect the candidate node and the subgraph. These two modules are later combined together for the final prediction. We also propose a self-supervised strategy to construct training samples, as well as an inference algorithm that is specifically designed to complete event graphs. Extensive experimental results on four datasets demonstrate that our proposed method achieves state-of-the-art performance, with 4.3% to 19.4% absolute F1 gains over the best baseline method on the four datasets.

Via

Access Paper or Ask Questions

All Birds with One Stone: Multi-task Text Classification for Efficient Inference with One Forward Pass

May 22, 2022
Jiaxin Huang, Tianqi Liu, Jialu Liu, Adam D. Lelkes, Cong Yu, Jiawei Han

Figure 1 for All Birds with One Stone: Multi-task Text Classification for Efficient Inference with One Forward Pass

Figure 2 for All Birds with One Stone: Multi-task Text Classification for Efficient Inference with One Forward Pass

Figure 3 for All Birds with One Stone: Multi-task Text Classification for Efficient Inference with One Forward Pass

Figure 4 for All Birds with One Stone: Multi-task Text Classification for Efficient Inference with One Forward Pass

Multi-Task Learning (MTL) models have shown their robustness, effectiveness, and efficiency for transferring learned knowledge across tasks. In real industrial applications such as web content classification, multiple classification tasks are predicted from the same input text such as a web article. However, at the serving time, the existing multitask transformer models such as prompt or adaptor based approaches need to conduct N forward passes for N tasks with O(N) computation cost. To tackle this problem, we propose a scalable method that can achieve stronger performance with close to O(1) computation cost via only one forward pass. To illustrate real application usage, we release a multitask dataset on news topic and style classification. Our experiments show that our proposed method outperforms strong baselines on both the GLUE benchmark and our news dataset. Our code and dataset are publicly available at https://bit.ly/mtop-code.

Via

Access Paper or Ask Questions

Heterformer: A Transformer Architecture for Node Representation Learning on Heterogeneous Text-Rich Networks

May 20, 2022
Bowen Jin, Yu Zhang, Qi Zhu, Jiawei Han

Figure 1 for Heterformer: A Transformer Architecture for Node Representation Learning on Heterogeneous Text-Rich Networks

Figure 2 for Heterformer: A Transformer Architecture for Node Representation Learning on Heterogeneous Text-Rich Networks

Figure 3 for Heterformer: A Transformer Architecture for Node Representation Learning on Heterogeneous Text-Rich Networks

Figure 4 for Heterformer: A Transformer Architecture for Node Representation Learning on Heterogeneous Text-Rich Networks

We study node representation learning on heterogeneous text-rich networks, where nodes and edges are multi-typed and some types of nodes are associated with text information. Although recent studies on graph neural networks (GNNs) and pretrained language models (PLMs) have demonstrated their power in encoding network and text signals, respectively, less focus has been given to delicately coupling these two types of models on heterogeneous text-rich networks. Specifically, existing GNNs rarely model text in each node in a contextualized way; existing PLMs can hardly be applied to characterize graph structures due to their sequence architecture. In this paper, we propose Heterformer, a Heterogeneous GNN-nested transformer that blends GNNs and PLMs into a unified model. Different from previous "cascaded architectures" that directly add GNN layers upon a PLM, our Heterformer alternately stacks two modules - a graph-attention-based neighbor aggregation module and a transformer-based text and neighbor joint encoding module - to facilitate thorough mutual enhancement between network and text signals. Meanwhile, Heterformer is capable of characterizing network heterogeneity and nodes without text information. Comprehensive experiments on three large-scale datasets from different domains demonstrate the superiority of Heterformer over state-of-the-art baselines in link prediction, transductive/inductive node classification, node clustering, and semantics-based retrieval.

Via

Access Paper or Ask Questions

CiteSum: Citation Text-guided Scientific Extreme Summarization and Low-resource Domain Adaptation

May 12, 2022
Yuning Mao, Ming Zhong, Jiawei Han

Figure 1 for CiteSum: Citation Text-guided Scientific Extreme Summarization and Low-resource Domain Adaptation

Figure 2 for CiteSum: Citation Text-guided Scientific Extreme Summarization and Low-resource Domain Adaptation

Figure 3 for CiteSum: Citation Text-guided Scientific Extreme Summarization and Low-resource Domain Adaptation

Figure 4 for CiteSum: Citation Text-guided Scientific Extreme Summarization and Low-resource Domain Adaptation

Scientific extreme summarization (TLDR) aims to form ultra-short summaries of scientific papers. Previous efforts on curating scientific TLDR datasets failed to scale up due to the heavy human annotation and domain expertise required. In this paper, we propose a simple yet effective approach to automatically extracting TLDR summaries for scientific papers from their citation texts. Based on the proposed approach, we create a new benchmark CiteSum without human annotation, which is around 30 times larger than the previous human-curated dataset SciTLDR. We conduct a comprehensive analysis of CiteSum, examining its data characteristics and establishing strong baselines. We further demonstrate the usefulness of CiteSum by adapting models pre-trained on CiteSum (named CITES) to new tasks and domains with limited supervision. For scientific extreme summarization, CITES outperforms most fully-supervised methods on SciTLDR without any fine-tuning and obtains state-of-the-art results with only 128 examples. For news extreme summarization, CITES achieves significant gains on XSum over its base model (not pre-trained on CiteSum), e.g., +7.2 ROUGE-1 zero-shot performance and state-of-the-art few-shot performance. For news headline generation, CITES performs the best among unsupervised and zero-shot methods on Gigaword.

* TLDR: By pretraining on (automatically extracted) citation sentences in scientific papers, we achieve SOTA on SciTLDR, XSum, and Gigaword in zero-shot and/or few-shot settings

Via

Access Paper or Ask Questions

Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds

May 04, 2022
Yu Zhang, Yu Meng, Xuan Wang, Sheng Wang, Jiawei Han

Figure 1 for Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds

Figure 2 for Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds

Figure 3 for Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds

Figure 4 for Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds

Discovering latent topics from text corpora has been studied for decades. Many existing topic models adopt a fully unsupervised setting, and their discovered topics may not cater to users' particular interests due to their inability of leveraging user guidance. Although there exist seed-guided topic discovery approaches that leverage user-provided seeds to discover topic-representative terms, they are less concerned with two factors: (1) the existence of out-of-vocabulary seeds and (2) the power of pre-trained language models (PLMs). In this paper, we generalize the task of seed-guided topic discovery to allow out-of-vocabulary seeds. We propose a novel framework, named SeeTopic, wherein the general knowledge of PLMs and the local semantics learned from the input corpus can mutually benefit each other. Experiments on three real datasets from different domains demonstrate the effectiveness of SeeTopic in terms of topic coherence, accuracy, and diversity.

* 12 pages; Accepted to NAACL 2022

Via

Access Paper or Ask Questions

OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision

Apr 29, 2022
Xinyang Zhang, Chenwei Zhang, Xian Li, Xin Luna Dong, Jingbo Shang, Christos Faloutsos, Jiawei Han

Figure 1 for OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision

Figure 2 for OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision

Figure 3 for OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision

Figure 4 for OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision

Automatic extraction of product attributes from their textual descriptions is essential for online shopper experience. One inherent challenge of this task is the emerging nature of e-commerce products -- we see new types of products with their unique set of new attributes constantly. Most prior works on this matter mine new values for a set of known attributes but cannot handle new attributes that arose from constantly changing data. In this work, we study the attribute mining problem in an open-world setting to extract novel attributes and their values. Instead of providing comprehensive training data, the user only needs to provide a few examples for a few known attribute types as weak supervision. We propose a principled framework that first generates attribute value candidates and then groups them into clusters of attributes. The candidate generation step probes a pre-trained language model to extract phrases from product titles. Then, an attribute-aware fine-tuning method optimizes a multitask objective and shapes the language model representation to be attribute-discriminative. Finally, we discover new attributes and values through the self-ensemble of our framework, which handles the open-world challenge. We run extensive experiments on a large distantly annotated development set and a gold standard human-annotated test set that we collected. Our model significantly outperforms strong baselines and can generalize to unseen attributes and product types.

* WWW 2022

Via

Access Paper or Ask Questions