Alert button
Picture for Zhouhan Lin

Zhouhan Lin

Alert button

I2SRM: Intra- and Inter-Sample Relationship Modeling for Multimodal Information Extraction

Oct 10, 2023
Yusheng Huang, Zhouhan Lin

Figure 1 for I2SRM: Intra- and Inter-Sample Relationship Modeling for Multimodal Information Extraction
Figure 2 for I2SRM: Intra- and Inter-Sample Relationship Modeling for Multimodal Information Extraction
Figure 3 for I2SRM: Intra- and Inter-Sample Relationship Modeling for Multimodal Information Extraction
Figure 4 for I2SRM: Intra- and Inter-Sample Relationship Modeling for Multimodal Information Extraction

Multimodal information extraction is attracting research attention nowadays, which requires aggregating representations from different modalities. In this paper, we present the Intra- and Inter-Sample Relationship Modeling (I2SRM) method for this task, which contains two modules. Firstly, the intra-sample relationship modeling module operates on a single sample and aims to learn effective representations. Embeddings from textual and visual modalities are shifted to bridge the modality gap caused by distinct pre-trained language and image models. Secondly, the inter-sample relationship modeling module considers relationships among multiple samples and focuses on capturing the interactions. An AttnMixup strategy is proposed, which not only enables collaboration among samples but also augments data to improve generalization. We conduct extensive experiments on the multimodal named entity recognition datasets Twitter-2015 and Twitter-2017, and the multimodal relation extraction dataset MNRE. Our proposed method I2SRM achieves competitive results, 77.12% F1-score on Twitter-2015, 88.40% F1-score on Twitter-2017, and 84.12% F1-score on MNRE.

Viaarxiv icon

Tailoring Self-Attention for Graph via Rooted Subtrees

Oct 08, 2023
Siyuan Huang, Yunchong Song, Jiayue Zhou, Zhouhan Lin

Figure 1 for Tailoring Self-Attention for Graph via Rooted Subtrees
Figure 2 for Tailoring Self-Attention for Graph via Rooted Subtrees
Figure 3 for Tailoring Self-Attention for Graph via Rooted Subtrees
Figure 4 for Tailoring Self-Attention for Graph via Rooted Subtrees

Attention mechanisms have made significant strides in graph learning, yet they still exhibit notable limitations: local attention faces challenges in capturing long-range information due to the inherent problems of the message-passing scheme, while global attention cannot reflect the hierarchical neighborhood structure and fails to capture fine-grained local information. In this paper, we propose a novel multi-hop graph attention mechanism, named Subtree Attention (STA), to address the aforementioned issues. STA seamlessly bridges the fully-attentional structure and the rooted subtree, with theoretical proof that STA approximates the global attention under extreme settings. By allowing direct computation of attention weights among multi-hop neighbors, STA mitigates the inherent problems in existing graph attention mechanisms. Further we devise an efficient form for STA by employing kernelized softmax, which yields a linear time complexity. Our resulting GNN architecture, the STAGNN, presents a simple yet performant STA-based graph neural network leveraging a hop-aware attention strategy. Comprehensive evaluations on ten node classification datasets demonstrate that STA-based models outperform existing graph transformers and mainstream GNNs. The code is available at https://github.com/LUMIA-Group/SubTree-Attention.

* Accepted at NeurIPS 2023. 23 pages in total with the appendix 
Viaarxiv icon

Learning A Foundation Language Model for Geoscience Knowledge Understanding and Utilization

Jun 08, 2023
Cheng Deng, Tianhang Zhang, Zhongmou He, Qiyuan Chen, Yuanyuan Shi, Le Zhou, Luoyi Fu, Weinan Zhang, Xinbing Wang, Chenghu Zhou, Zhouhan Lin, Junxian He

Figure 1 for Learning A Foundation Language Model for Geoscience Knowledge Understanding and Utilization
Figure 2 for Learning A Foundation Language Model for Geoscience Knowledge Understanding and Utilization
Figure 3 for Learning A Foundation Language Model for Geoscience Knowledge Understanding and Utilization
Figure 4 for Learning A Foundation Language Model for Geoscience Knowledge Understanding and Utilization

Large language models (LLMs)have achieved great success in general domains of natural language processing. In this paper, we bring LLMs to the realm of geoscience, with the objective of advancing research and applications in this field. To this end, we present the first-ever LLM in geoscience, K2, alongside a suite of resources developed to further promote LLM research within geoscience. For instance, we have curated the first geoscience instruction tuning dataset, GeoSignal, which aims to align LLM responses to geoscience-related user queries. Additionally, we have established the first geoscience benchmark, GeoBenchmark, to evaluate LLMs in the context of geoscience. In this work, we experiment with a complete recipe to adapt a pretrained general-domain LLM to the geoscience domain. Specifically, we further train the LLaMA-7B model on over 1 million pieces of geoscience literature and utilize GeoSignal's supervised data to fine-tune the model. Moreover, we share a protocol that can efficiently gather domain-specific data and construct domain-supervised data, even in situations where manpower is scarce. Experiments conducted on the GeoBenchmark demonstrate the the effectiveness of our approach and datasets.

Viaarxiv icon

Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator

May 24, 2023
Ziwei He, Meng Yang, Minwei Feng, Jingcheng Yin, Xinbing Wang, Jingwen Leng, Zhouhan Lin

Figure 1 for Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Figure 2 for Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Figure 3 for Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Figure 4 for Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator

The transformer model is known to be computationally demanding, and prohibitively costly for long sequences, as the self-attention module uses a quadratic time and space complexity with respect to sequence length. Many researchers have focused on designing new forms of self-attention or introducing new parameters to overcome this limitation, however a large portion of them prohibits the model to inherit weights from large pretrained models. In this work, the transformer's inefficiency has been taken care of from another perspective. We propose Fourier Transformer, a simple yet effective approach by progressively removing redundancies in hidden sequence using the ready-made Fast Fourier Transform (FFT) operator to perform Discrete Cosine Transformation (DCT). Fourier Transformer is able to significantly reduce computational costs while retain the ability to inherit from various large pretrained models. Experiments show that our model achieves state-of-the-art performances among all transformer-based models on the long-range modeling benchmark LRA with significant improvement in both speed and space. For generative seq-to-seq tasks including CNN/DailyMail and ELI5, by inheriting the BART weights our model outperforms the standard BART and other efficient models. \footnote{Our code is publicly available at \url{https://github.com/LUMIA-Group/FourierTransformer}}

Viaarxiv icon

Asymmetric Polynomial Loss For Multi-Label Classification

Apr 10, 2023
Yusheng Huang, Jiexing Qi, Xinbing Wang, Zhouhan Lin

Figure 1 for Asymmetric Polynomial Loss For Multi-Label Classification
Figure 2 for Asymmetric Polynomial Loss For Multi-Label Classification
Figure 3 for Asymmetric Polynomial Loss For Multi-Label Classification
Figure 4 for Asymmetric Polynomial Loss For Multi-Label Classification

Various tasks are reformulated as multi-label classification problems, in which the binary cross-entropy (BCE) loss is frequently utilized for optimizing well-designed models. However, the vanilla BCE loss cannot be tailored for diverse tasks, resulting in a suboptimal performance for different models. Besides, the imbalance between redundant negative samples and rare positive samples could degrade the model performance. In this paper, we propose an effective Asymmetric Polynomial Loss (APL) to mitigate the above issues. Specifically, we first perform Taylor expansion on BCE loss. Then we ameliorate the coefficients of polynomial functions. We further employ the asymmetric focusing mechanism to decouple the gradient contribution from the negative and positive samples. Moreover, we validate that the polynomial coefficients can recalibrate the asymmetric focusing hyperparameters. Experiments on relation extraction, text classification, and image classification show that our APL loss can consistently improve performance without extra training burden.

* ICASSP 2023 
Viaarxiv icon

Few-Shot Table-to-Text Generation with Prompt Planning and Knowledge Memorization

Feb 24, 2023
Zhixin Guo, Minyxuan Yan, Jiexing Qi, Jianping Zhou, Ziwei He, Zhouhan Lin, Guanjie Zheng, Xinbing Wang

Figure 1 for Few-Shot Table-to-Text Generation with Prompt Planning and Knowledge Memorization
Figure 2 for Few-Shot Table-to-Text Generation with Prompt Planning and Knowledge Memorization
Figure 3 for Few-Shot Table-to-Text Generation with Prompt Planning and Knowledge Memorization
Figure 4 for Few-Shot Table-to-Text Generation with Prompt Planning and Knowledge Memorization

Pre-trained language models (PLM) have achieved remarkable advancement in table-to-text generation tasks. However, the lack of labeled domain-specific knowledge and the topology gap between tabular data and text make it difficult for PLMs to yield faithful text. Low-resource generation likewise faces unique challenges in this domain. Inspired by how humans descript tabular data with prior knowledge, we suggest a new framework: PromptMize, which targets table-to-text generation under few-shot settings. The design of our framework consists of two aspects: a prompt planner and a knowledge adapter. The prompt planner aims to generate a prompt signal that provides instance guidance for PLMs to bridge the topology gap between tabular data and text. Moreover, the knowledge adapter memorizes domain-specific knowledge from the unlabelled corpus to supply essential information during generation. Extensive experiments and analyses are investigated on three open domain few-shot NLG datasets: human, song, and book. Compared with previous state-of-the-art approaches, our model achieves remarkable performance in generating quality as judged by human and automatic evaluations.

* not good enough we changed the contend and rename the article with a new submission 
Viaarxiv icon

Few-Shot Table-to-Text Generation with Prompt-based Adapter

Feb 24, 2023
Zhixin Guo, Minyxuan Yan, Jiexing Qi, Jianping Zhou, Ziwei He, Zhouhan Lin, Guanjie Zheng, Xinbing Wang

Figure 1 for Few-Shot Table-to-Text Generation with Prompt-based Adapter
Figure 2 for Few-Shot Table-to-Text Generation with Prompt-based Adapter
Figure 3 for Few-Shot Table-to-Text Generation with Prompt-based Adapter
Figure 4 for Few-Shot Table-to-Text Generation with Prompt-based Adapter

Pre-trained language models (PLMs) have made remarkable progress in table-to-text generation tasks. However, the topological gap between tabular data and text and the lack of domain-specific knowledge make it difficult for PLMs to produce faithful text, especially in real-world applications with limited resources. In this paper, we mitigate the above challenges by introducing a novel augmentation method: Prompt-based Adapter (PA), which targets table-to-text generation under few-shot conditions. The core insight design of the PA is to inject prompt templates for augmenting domain-specific knowledge and table-related representations into the model for bridging the structural gap between tabular data and descriptions through adapters. Such prompt-based knowledge augmentation method brings at least two benefits: (1) enables us to fully use the large amounts of unlabelled domain-specific knowledge, which can alleviate the PLMs' inherent shortcomings of lacking domain knowledge; (2) allows us to design different types of tasks supporting the generative challenge. Extensive experiments and analyses are conducted on three open-domain few-shot NLG datasets: Humans, Books, and Songs. Compared to previous state-of-the-art approaches, our model achieves superior performance in terms of both fluency and accuracy as judged by human and automatic evaluations.

* arXiv admin note: substantial text overlap with arXiv:2302.04415 
Viaarxiv icon

Text Classification in the Wild: a Large-scale Long-tailed Name Normalization Dataset

Feb 19, 2023
Jiexing Qi, Shuhao Li, Zhixin Guo, Yusheng Huang, Chenghu Zhou, Weinan Zhang, Xinbing Wang, Zhouhan Lin

Figure 1 for Text Classification in the Wild: a Large-scale Long-tailed Name Normalization Dataset
Figure 2 for Text Classification in the Wild: a Large-scale Long-tailed Name Normalization Dataset
Figure 3 for Text Classification in the Wild: a Large-scale Long-tailed Name Normalization Dataset
Figure 4 for Text Classification in the Wild: a Large-scale Long-tailed Name Normalization Dataset

Real-world data usually exhibits a long-tailed distribution,with a few frequent labels and a lot of few-shot labels. The study of institution name normalization is a perfect application case showing this phenomenon. There are many institutions worldwide with enormous variations of their names in the publicly available literature. In this work, we first collect a large-scale institution name normalization dataset LoT-insts1, which contains over 25k classes that exhibit a naturally long-tailed distribution. In order to isolate the few-shot and zero-shot learning scenarios from the massive many-shot classes, we construct our test set from four different subsets: many-, medium-, and few-shot sets, as well as a zero-shot open set. We also replicate several important baseline methods on our data, covering a wide range from search-based methods to neural network methods that use the pretrained BERT model. Further, we propose our specially pretrained, BERT-based model that shows better out-of-distribution generalization on few-shot and zero-shot test sets. Compared to other datasets focusing on the long-tailed phenomenon, our dataset has one order of magnitude more training data than the largest existing long-tailed datasets and is naturally long-tailed rather than manually synthesized. We believe it provides an important and different scenario to study this problem. To our best knowledge, this is the first natural language dataset that focuses on long-tailed and open-set classification problems.

* A shorter version is accepted in ICASSP 2023 
Viaarxiv icon

Ordered GNN: Ordering Message Passing to Deal with Heterophily and Over-smoothing

Feb 03, 2023
Yunchong Song, Chenghu Zhou, Xinbing Wang, Zhouhan Lin

Figure 1 for Ordered GNN: Ordering Message Passing to Deal with Heterophily and Over-smoothing
Figure 2 for Ordered GNN: Ordering Message Passing to Deal with Heterophily and Over-smoothing
Figure 3 for Ordered GNN: Ordering Message Passing to Deal with Heterophily and Over-smoothing
Figure 4 for Ordered GNN: Ordering Message Passing to Deal with Heterophily and Over-smoothing

Most graph neural networks follow the message passing mechanism. However, it faces the over-smoothing problem when multiple times of message passing is applied to a graph, causing indistinguishable node representations and prevents the model to effectively learn dependencies between farther-away nodes. On the other hand, features of neighboring nodes with different labels are likely to be falsely mixed, resulting in the heterophily problem. In this work, we propose to order the messages passing into the node representation, with specific blocks of neurons targeted for message passing within specific hops. This is achieved by aligning the hierarchy of the rooted-tree of a central node with the ordered neurons in its node representation. Experimental results on an extensive set of datasets show that our model can simultaneously achieve the state-of-the-art in both homophily and heterophily settings, without any targeted design. Moreover, its performance maintains pretty well while the model becomes really deep, effectively preventing the over-smoothing problem. Finally, visualizing the gating vectors shows that our model learns to behave differently between homophily and heterophily settings, providing an explainable graph neural model.

* Published as a conference paper at ICLR 2023 
Viaarxiv icon