Current approaches to question answering rely on pre-trained language models (PLMs) like RoBERTa. This work challenges the existing question-answer encoding convention and explores finer representations. We begin with testing various pooling methods compared to using the begin-of-sentence token as a question representation for better quality. Next, we explore opportunities to simultaneously embed all answer candidates with the question. This enables cross-reference between answer choices and improves inference throughput via reduced memory usage. Despite their simplicity and effectiveness, these methods have yet to be widely studied in current frameworks. We experiment with different PLMs, and with and without the integration of knowledge graphs. Results prove that the memory efficacy of the proposed techniques with little sacrifice in performance. Practically, our work enhances 38-100% throughput with 26-65% speedups on consumer-grade GPUs by allowing for considerably larger batch sizes. Our work sends a message to the community with promising directions in both representation quality and efficiency for the question-answering task in natural language processing.
With the increasing prevalence of smartphones and websites, Image Aesthetic Assessment (IAA) has become increasingly crucial. While the significance of attributes in IAA is widely recognized, many attribute-based methods lack consideration for the selection and utilization of aesthetic attributes. Our initial step involves the acquisition of aesthetic attributes from both intra- and inter-perspectives. Within the intra-perspective, we extract the direct visual attributes of images, constituting the absolute attribute. In the inter-perspective, our focus lies in modeling the relative score relationships between images within the same sequence, forming the relative attribute. Then, to better utilize image attributes in aesthetic assessment, we propose the Unified Multi-attribute Aesthetic Assessment Framework (UMAAF) to model both absolute and relative attributes of images. For absolute attributes, we leverage multiple absolute-attribute perception modules and an absolute-attribute interacting network. The absolute-attribute perception modules are first pre-trained on several absolute-attribute learning tasks and then used to extract corresponding absolute attribute features. The absolute-attribute interacting network adaptively learns the weight of diverse absolute-attribute features, effectively integrating them with generic aesthetic features from various absolute-attribute perspectives and generating the aesthetic prediction. To model the relative attribute of images, we consider the relative ranking and relative distance relationships between images in a Relative-Relation Loss function, which boosts the robustness of the UMAAF. Furthermore, UMAAF achieves state-of-the-art performance on TAD66K and AVA datasets, and multiple experiments demonstrate the effectiveness of each module and the model's alignment with human preference.
Visually-situated languages such as charts and plots are omnipresent in real-world documents. These graphical depictions are human-readable and are often analyzed in visually-rich documents to address a variety of questions that necessitate complex reasoning and common-sense responses. Despite the growing number of datasets that aim to answer questions over charts, most only address this task in isolation, without considering the broader context of document-level question answering. Moreover, such datasets lack adequate common-sense reasoning information in their questions. In this work, we introduce a novel task named document-level chart question answering (DCQA). The goal of this task is to conduct document-level question answering, extracting charts or plots in the document via document layout analysis (DLA) first and subsequently performing chart question answering (CQA). The newly developed benchmark dataset comprises 50,010 synthetic documents integrating charts in a wide range of styles (6 styles in contrast to 3 for PlotQA and ChartQA) and includes 699,051 questions that demand a high degree of reasoning ability and common-sense understanding. Besides, we present the development of a potent question-answer generation engine that employs table data, a rich color set, and basic question templates to produce a vast array of reasoning question-answer pairs automatically. Based on DCQA, we devise an OCR-free transformer for document-level chart-oriented understanding, capable of DLA and answering complex reasoning and common-sense questions over charts in an OCR-free manner. Our DCQA dataset is expected to foster research on understanding visualizations in documents, especially for scenarios that require complex reasoning for charts in the visually-rich document. We implement and evaluate a set of baselines, and our proposed method achieves comparable results.
Spectral Graph Neural Networks (GNNs) are gaining attention because they can surpass the limitations of message-passing GNNs by learning spectral filters that capture essential frequency information in graph data through task supervision. However, previous research suggests that the choice of filter frequency is tied to the graph's homophily level, a connection that hasn't been thoroughly explored in existing spectral GNNs. To address this gap, the study conducts both theoretical and empirical analyses, revealing that low-frequency filters have a positive correlation with homophily, while high-frequency filters have a negative correlation. This leads to the introduction of a shape-aware regularization technique applied to a Newton Interpolation-based spectral filter, enabling the customization of polynomial spectral filters that align with desired homophily levels. Extensive experiments demonstrate that NewtonNet successfully achieves the desired filter shapes and exhibits superior performance on both homophilous and heterophilous datasets.
Graph Neural Networks (GNNs) have achieved state-of-the-art performance for link prediction. However, GNNs suffer from poor interpretability, which limits their adoptions in critical scenarios that require knowing why certain links are predicted. Despite various methods proposed for the explainability of GNNs, most of them are post-hoc explainers developed for explaining node classification. Directly adopting existing post-hoc explainers for explaining link prediction is sub-optimal because: (i) post-hoc explainers usually adopt another strategy or model to explain a target model, which could misinterpret the target model; and (ii) GNN explainers for node classification identify crucial subgraphs around each node for the explanation; while for link prediction, one needs to explain the prediction for each pair of nodes based on graph structure and node attributes. Therefore, in this paper, we study a novel problem of self-explainable GNNs for link prediction, which can simultaneously give accurate predictions and explanations. Concretely, we propose a new framework and it can find various $K$ important neighbors of one node to learn pair-specific representations for links from this node to other nodes. These $K$ different neighbors represent important characteristics of the node and model various factors for links from it. Thus, $K$ neighbors can provide explanations for the existence of links. Experiments on both synthetic and real-world datasets verify the effectiveness of the proposed framework for link prediction and explanation.
Graph neural networks (GNNs) have achieved great success in various graph problems. However, most GNNs are Message Passing Neural Networks (MPNNs) based on the homophily assumption, where nodes with the same label are connected in graphs. Real-world problems bring us heterophily problems, where nodes with different labels are connected in graphs. MPNNs fail to address the heterophily problem because they mix information from different distributions and are not good at capturing global patterns. Therefore, we investigate a novel Graph Memory Networks model on Heterophilous Graphs (HP-GMN) to the heterophily problem in this paper. In HP-GMN, local information and global patterns are learned by local statistics and the memory to facilitate the prediction. We further propose regularization terms to help the memory learn global information. We conduct extensive experiments to show that our method achieves state-of-the-art performance on both homophilous and heterophilous graphs.
Deep learning has shown its human-level performance in various applications. However, current deep learning models are characterised by catastrophic forgetting of old knowledge when learning new classes. This poses a challenge particularly in intelligent diagnosis systems where initially only training data of a limited number of diseases are available. In this case, updating the intelligent system with data of new diseases would inevitably downgrade its performance on previously learned diseases. Inspired by the process of learning new knowledge in human brains, we propose a Bayesian generative model for continual learning built on a fixed pre-trained feature extractor. In this model, knowledge of each old class can be compactly represented by a collection of statistical distributions, e.g. with Gaussian mixture models, and naturally kept from forgetting in continual learning over time. Unlike existing class-incremental learning methods, the proposed approach is not sensitive to the continual learning process and can be additionally well applied to the data-incremental learning scenario. Experiments on multiple medical and natural image classification tasks showed that the proposed approach outperforms state-of-the-art approaches which even keep some images of old classes during continual learning of new classes.
Graph Neural Networks (GNNs) have made rapid developments in the recent years. Due to their great ability in modeling graph-structured data, GNNs are vastly used in various applications, including high-stakes scenarios such as financial analysis, traffic predictions, and drug discovery. Despite their great potential in benefiting humans in the real world, recent study shows that GNNs can leak private information, are vulnerable to adversarial attacks, can inherit and magnify societal bias from training data and lack interpretability, which have risk of causing unintentional harm to the users and society. For example, existing works demonstrate that attackers can fool the GNNs to give the outcome they desire with unnoticeable perturbation on training graph. GNNs trained on social networks may embed the discrimination in their decision process, strengthening the undesirable societal bias. Consequently, trustworthy GNNs in various aspects are emerging to prevent the harm from GNN models and increase the users' trust in GNNs. In this paper, we give a comprehensive survey of GNNs in the computational aspects of privacy, robustness, fairness, and explainability. For each aspect, we give the taxonomy of the related methods and formulate the general frameworks for the multiple categories of trustworthy GNNs. We also discuss the future research directions of each aspect and connections between these aspects to help achieve trustworthiness.
We present TODS, an automated Time Series Outlier Detection System for research and industrial applications. TODS is a highly modular system that supports easy pipeline construction. The basic building block of TODS is primitive, which is an implementation of a function with hyperparameters. TODS currently supports 70 primitives, including data processing, time series processing, feature analysis, detection algorithms, and a reinforcement module. Users can freely construct a pipeline using these primitives and perform end- to-end outlier detection with the constructed pipeline. TODS provides a Graphical User Interface (GUI), where users can flexibly design a pipeline with drag-and-drop. Moreover, a data-driven searcher is provided to automatically discover the most suitable pipelines given a dataset. TODS is released under Apache 2.0 license at https://github.com/datamllab/tods.