Unsupervised domain adaptation aims to transfer rich knowledge from the annotated source domain to the unlabeled target domain with the same label space. One prevalent solution is the bi-discriminator domain adversarial network, which strives to identify target domain samples outside the support of the source domain distribution and enforces their classification to be consistent on both discriminators. Despite being effective, agnostic accuracy and overconfident estimation for out-of-distribution samples hinder its further performance improvement. To address the above challenges, we propose a novel bi-discriminator domain adversarial neural network with class-level gradient alignment, i.e. BACG. BACG resorts to gradient signals and second-order probability estimation for better alignment of domain distributions. Specifically, for accuracy-awareness, we first design an optimizable nearest neighbor algorithm to obtain pseudo-labels of samples in the target domain, and then enforce the backward gradient approximation of the two discriminators at the class level. Furthermore, following evidential learning theory, we transform the traditional softmax-based optimization method into a Multinomial Dirichlet hierarchical model to infer the class probability distribution as well as samples uncertainty, thereby alleviating misestimation of out-of-distribution samples and guaranteeing high-quality classes alignment. In addition, inspired by contrastive learning, we develop a memory bank-based variant, i.e. Fast-BACG, which can greatly shorten the training process at the cost of a minor decrease in accuracy. Extensive experiments and detailed theoretical analysis on four benchmark data sets validate the effectiveness and robustness of our algorithm.
* False Figure citation. For instance, Figure 8 hasn't been cited
Cognitive diagnosis aims to diagnose students' knowledge proficiencies based on their response scores on exam questions, which is the basis of many domains such as computerized adaptive testing. Existing cognitive diagnosis models (CDMs) follow a proficiency-response paradigm, which views diagnostic results as learnable embeddings that are the cause of students' responses and learns the diagnostic results through optimization. However, such a paradigm can easily lead to unidentifiable diagnostic results and the explainability overfitting problem, which is harmful to the quantification of students' learning performance. To address these problems, we propose a novel identifiable cognitive diagnosis framework. Specifically, we first propose a flexible diagnostic module which directly diagnose identifiable and explainable examinee traits and question features from response logs. Next, we leverage a general predictive module to reconstruct response logs from the diagnostic results to ensure the preciseness of the latter. We furthermore propose an implementation of the framework, i.e., ID-CDM, to demonstrate the availability of the former. Finally, we demonstrate the identifiability, explainability and preciseness of diagnostic results of ID-CDM through experiments on four public real-world datasets.
Machine learning algorithms have become ubiquitous in a number of applications (e.g. image classification). However, due to the insufficient measurement of traditional metrics (e.g. the coarse-grained Accuracy of each classifier), substantial gaps are usually observed between the real-world performance of these algorithms and their scores in standardized evaluations. In this paper, inspired by the psychometric theories from human measurement, we propose a task-agnostic evaluation framework Camilla, where a multi-dimensional diagnostic metric Ability is defined for collaboratively measuring the multifaceted strength of each machine learning algorithm. Specifically, given the response logs from different algorithms to data samples, we leverage cognitive diagnosis assumptions and neural networks to learn the complex interactions among algorithms, samples and the skills (explicitly or implicitly pre-defined) of each sample. In this way, both the abilities of each algorithm on multiple skills and some of the sample factors (e.g. sample difficulty) can be simultaneously quantified. We conduct extensive experiments with hundreds of machine learning algorithms on four public datasets, and our experimental results demonstrate that Camilla not only can capture the pros and cons of each algorithm more precisely, but also outperforms state-of-the-art baselines on the metric reliability, rank consistency and rank stability.
Large language models (LLMs), like ChatGPT, have shown some human-like cognitive abilities. For comparing these abilities of different models, several benchmarks (i.e. sets of standard test questions) from different fields (e.g., Literature, Biology and Psychology) are often adopted and the test results under traditional metrics such as accuracy, recall and F1, are reported. However, such way for evaluating LLMs can be inefficient and inaccurate from the cognitive science perspective. Inspired by Computerized Adaptive Testing (CAT) used in psychometrics, we propose an adaptive testing framework for LLM evaluation. Rather than using a standard test set and simply reporting accuracy, this approach dynamically adjusts the characteristics of the test questions, such as difficulty, based on the model's performance. This allows for a more accurate estimation of the model's abilities, using fewer questions. More importantly, it allows LLMs to be compared with humans easily, which is essential for NLP models that aim for human-level ability. Our diagnostic reports have found that ChatGPT often behaves like a ``careless student'', prone to slip and occasionally guessing the questions. We conduct a fine-grained diagnosis and rank the latest 6 instruction-tuned LLMs from three aspects of Subject Knowledge, Mathematical Reasoning, and Programming, where GPT4 can outperform other models significantly and reach the cognitive ability of middle-level students. Different tests for different models using efficient adaptive testing -- we believe this has the potential to become a new norm in evaluating large language models.
Knowledge tracing (KT) aims to assess individuals' evolving knowledge states according to their learning interactions with different exercises in online learning systems (OIS), which is critical in supporting decision-making for subsequent intelligent services, such as personalized learning source recommendation. Existing researchers have broadly studied KT and developed many effective methods. However, most of them assume that students' historical interactions are uniformly distributed in a continuous sequence, ignoring the fact that actual interaction sequences are organized based on a series of quizzes with clear boundaries, where interactions within a quiz are consecutively completed, but interactions across different quizzes are discrete and may be spaced over days. In this paper, we present the Quiz-based Knowledge Tracing (QKT) model to monitor students' knowledge states according to their quiz-based learning interactions. Specifically, as students' interactions within a quiz are continuous and have the same or similar knowledge concepts, we design the adjacent gate followed by a global average pooling layer to capture the intra-quiz short-term knowledge influence. Then, as various quizzes tend to focus on different knowledge concepts, we respectively measure the inter-quiz knowledge substitution by the gated recurrent unit and the inter-quiz knowledge complementarity by the self-attentive encoder with a novel recency-aware attention mechanism. Finally, we integrate the inter-quiz long-term knowledge substitution and complementarity across different quizzes to output students' evolving knowledge states. Extensive experimental results on three public real-world datasets demonstrate that QKT achieves state-of-the-art performance compared to existing methods. Further analyses confirm that QKT is promising in designing more effective quizzes.
Mathematical reasoning is one of the crucial abilities of general artificial intelligence, which requires machines to master mathematical logic and knowledge from solving problems. However, existing approaches are not transparent (thus not interpretable) in terms of what knowledge has been learned and applied in the reasoning process. In this paper, we propose a general Learning by Applying (LeAp) framework to enhance existing models (backbones) in a principled way by explicit knowledge learning. In LeAp, we perform knowledge learning in a novel problem-knowledge-expression paradigm, with a Knowledge Encoder to acquire knowledge from problem data and a Knowledge Decoder to apply knowledge for expression reasoning. The learned mathematical knowledge, including word-word relations and word-operator relations, forms an explicit knowledge graph, which bridges the knowledge "learning" and "applying" organically. Moreover, for problem solving, we design a semantics-enhanced module and a reasoning-enhanced module that apply knowledge to improve the problem comprehension and symbol reasoning abilities of any backbone, respectively. We theoretically prove the superiority of LeAp's autonomous learning mechanism. Experiments on three real-world datasets show that LeAp improves all backbones' performances, learns accurate knowledge, and achieves a more interpretable reasoning process.
In the Natural Language for Optimization (NL4Opt) NeurIPS 2022 competition, competitors focus on improving the accessibility and usability of optimization solvers, with the aim of subtask 1: recognizing the semantic entities that correspond to the components of the optimization problem; subtask 2: generating formulations for the optimization problem. In this paper, we present the solution of our team. First, we treat subtask 1 as a named entity recognition (NER) problem with the solution pipeline including pre-processing methods, adversarial training, post-processing methods and ensemble learning. Besides, we treat subtask 2 as a generation problem with the solution pipeline including specially designed prompts, adversarial training, post-processing methods and ensemble learning. Our proposed methods have achieved the F1-score of 0.931 in subtask 1 and the accuracy of 0.867 in subtask 2, which won the fourth and third places respectively in this competition. Our code is available at https://github.com/bigdata-ustc/nl4opt.
Understanding mathematical questions effectively is a crucial task, which can benefit many applications, such as difficulty estimation. Researchers have drawn much attention to designing pre-training models for question representations due to the scarcity of human annotations (e.g., labeling difficulty). However, unlike general free-format texts (e.g., user comments), mathematical questions are generally designed with explicit purposes and mathematical logic, and usually consist of more complex content, such as formulas, and related mathematical knowledge (e.g., Function). Therefore, the problem of holistically representing mathematical questions remains underexplored. To this end, in this paper, we propose a novel contrastive pre-training approach for mathematical question representations, namely QuesCo, which attempts to bring questions with more similar purposes closer. Specifically, we first design two-level question augmentations, including content-level and structure-level, which generate literally diverse question pairs with similar purposes. Then, to fully exploit hierarchical information of knowledge concepts, we propose a knowledge hierarchy-aware rank strategy (KHAR), which ranks the similarities between questions in a fine-grained manner. Next, we adopt a ranking contrastive learning task to optimize our model based on the augmented and ranked questions. We conduct extensive experiments on two real-world mathematical datasets. The experimental results demonstrate the effectiveness of our model.
Many data mining tasks rely on graphs to model relational structures among individuals (nodes). Since relational data are often sensitive, there is an urgent need to evaluate the privacy risks in graph data. One famous privacy attack against data analysis models is the model inversion attack, which aims to infer sensitive data in the training dataset and leads to great privacy concerns. Despite its success in grid-like domains, directly applying model inversion attacks on non-grid domains such as graph leads to poor attack performance. This is mainly due to the failure to consider the unique properties of graphs. To bridge this gap, we conduct a systematic study on model inversion attacks against Graph Neural Networks (GNNs), one of the state-of-the-art graph analysis tools in this paper. Firstly, in the white-box setting where the attacker has full access to the target GNN model, we present GraphMI to infer the private training graph data. Specifically, in GraphMI, a projected gradient module is proposed to tackle the discreteness of graph edges and preserve the sparsity and smoothness of graph features; a graph auto-encoder module is used to efficiently exploit graph topology, node attributes, and target model parameters for edge inference; a random sampling module can finally sample discrete edges. Furthermore, in the hard-label black-box setting where the attacker can only query the GNN API and receive the classification results, we propose two methods based on gradient estimation and reinforcement learning (RL-GraphMI). Our experimental results show that such defenses are not sufficiently effective and call for more advanced defenses against privacy attacks.
* Accepted by TKDE. arXiv admin note: substantial text overlap with