Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huiqi Deng

Attribution Explanations for Deep Neural Networks: A Theoretical Perspective

Aug 11, 2025

Huiqi Deng, Hongbin Pei, Quanshi Zhang, Mengnan Du

Abstract:Attribution explanation is a typical approach for explaining deep neural networks (DNNs), inferring an importance or contribution score for each input variable to the final output. In recent years, numerous attribution methods have been developed to explain DNNs. However, a persistent concern remains unresolved, i.e., whether and which attribution methods faithfully reflect the actual contribution of input variables to the decision-making process. The faithfulness issue undermines the reliability and practical utility of attribution explanations. We argue that these concerns stem from three core challenges. First, difficulties arise in comparing attribution methods due to their unstructured heterogeneity, differences in heuristics, formulations, and implementations that lack a unified organization. Second, most methods lack solid theoretical underpinnings, with their rationales remaining absent, ambiguous, or unverified. Third, empirically evaluating faithfulness is challenging without ground truth. Recent theoretical advances provide a promising way to tackle these challenges, attracting increasing attention. We summarize these developments, with emphasis on three key directions: (i) Theoretical unification, which uncovers commonalities and differences among methods, enabling systematic comparisons; (ii) Theoretical rationale, clarifying the foundations of existing methods; (iii) Theoretical evaluation, rigorously proving whether methods satisfy faithfulness principles. Beyond a comprehensive review, we provide insights into how these studies help deepen theoretical understanding, inform method selection, and inspire new attribution methods. We conclude with a discussion of promising open problems for further work.

Via

Access Paper or Ask Questions

Towards Attributions of Input Variables in a Coalition

Sep 23, 2023

Xinhao Zheng, Huiqi Deng, Quanshi Zhang

Figure 1 for Towards Attributions of Input Variables in a Coalition

Abstract:This paper aims to develop a new attribution method to explain the conflict between individual variables' attributions and their coalition's attribution from a fully new perspective. First, we find that the Shapley value can be reformulated as the allocation of Harsanyi interactions encoded by the AI model. Second, based the re-alloction of interactions, we extend the Shapley value to the attribution of coalitions. Third we ective. We derive the fundamental mechanism behind the conflict. This conflict come from the interaction containing partial variables in their coalition.

Via

Access Paper or Ask Questions

Explainability for Large Language Models: A Survey

Sep 17, 2023

Haiyan Zhao, Hanjie Chen, Fan Yang, Ninghao Liu, Huiqi Deng, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Mengnan Du

Abstract:Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their internal mechanisms are still unclear and this lack of transparency poses unwanted risks for downstream applications. Therefore, understanding and explaining these models is crucial for elucidating their behaviors, limitations, and social impacts. In this paper, we introduce a taxonomy of explainability techniques and provide a structured overview of methods for explaining Transformer-based language models. We categorize techniques based on the training paradigms of LLMs: traditional fine-tuning-based paradigm and prompting-based paradigm. For each paradigm, we summarize the goals and dominant approaches for generating local explanations of individual predictions and global explanations of overall model knowledge. We also discuss metrics for evaluating generated explanations, and discuss how explanations can be leveraged to debug models and improve performance. Lastly, we examine key challenges and emerging opportunities for explanation techniques in the era of LLMs in comparison to conventional machine learning models.

Via

Access Paper or Ask Questions

Mitigating Shortcuts in Language Models with Soft Label Encoding

Sep 17, 2023

Zirui He, Huiqi Deng, Haiyan Zhao, Ninghao Liu, Mengnan Du

Figure 1 for Mitigating Shortcuts in Language Models with Soft Label Encoding

Figure 2 for Mitigating Shortcuts in Language Models with Soft Label Encoding

Figure 3 for Mitigating Shortcuts in Language Models with Soft Label Encoding

Figure 4 for Mitigating Shortcuts in Language Models with Soft Label Encoding

Abstract:Recent research has shown that large language models rely on spurious correlations in the data for natural language understanding (NLU) tasks. In this work, we aim to answer the following research question: Can we reduce spurious correlations by modifying the ground truth labels of the training data? Specifically, we propose a simple yet effective debiasing framework, named Soft Label Encoding (SoftLE). We first train a teacher model with hard labels to determine each sample's degree of relying on shortcuts. We then add one dummy class to encode the shortcut degree, which is used to smooth other dimensions in the ground truth label to generate soft labels. This new ground truth label is used to train a more robust student model. Extensive experiments on two NLU benchmark tasks demonstrate that SoftLE significantly improves out-of-distribution generalization while maintaining satisfactory in-distribution accuracy.

Via

Access Paper or Ask Questions

Understanding and Unifying Fourteen Attribution Methods with Taylor Interactions

Mar 06, 2023

Huiqi Deng, Na Zou, Mengnan Du, Weifu Chen, Guocan Feng, Ziwei Yang, Zheyang Li, Quanshi Zhang

Figure 1 for Understanding and Unifying Fourteen Attribution Methods with Taylor Interactions

Figure 2 for Understanding and Unifying Fourteen Attribution Methods with Taylor Interactions

Figure 3 for Understanding and Unifying Fourteen Attribution Methods with Taylor Interactions

Figure 4 for Understanding and Unifying Fourteen Attribution Methods with Taylor Interactions

Abstract:Various attribution methods have been developed to explain deep neural networks (DNNs) by inferring the attribution/importance/contribution score of each input variable to the final output. However, existing attribution methods are often built upon different heuristics. There remains a lack of a unified theoretical understanding of why these methods are effective and how they are related. To this end, for the first time, we formulate core mechanisms of fourteen attribution methods, which were designed on different heuristics, into the same mathematical system, i.e., the system of Taylor interactions. Specifically, we prove that attribution scores estimated by fourteen attribution methods can all be reformulated as the weighted sum of two types of effects, i.e., independent effects of each individual input variable and interaction effects between input variables. The essential difference among the fourteen attribution methods mainly lies in the weights of allocating different effects. Based on the above findings, we propose three principles for a fair allocation of effects to evaluate the faithfulness of the fourteen attribution methods.

Via

Access Paper or Ask Questions

Concept-Level Explanation for the Generalization of a DNN

Feb 25, 2023

Huilin Zhou, Hao Zhang, Huiqi Deng, Dongrui Liu, Wen Shen, Shih-Han Chan, Quanshi Zhang

Figure 1 for Concept-Level Explanation for the Generalization of a DNN

Figure 2 for Concept-Level Explanation for the Generalization of a DNN

Figure 3 for Concept-Level Explanation for the Generalization of a DNN

Figure 4 for Concept-Level Explanation for the Generalization of a DNN

Abstract:This paper explains the generalization power of a deep neural network (DNN) from the perspective of interactive concepts. Many recent studies have quantified a clear emergence of interactive concepts encoded by the DNN, which have been observed on different DNNs during the learning process. Therefore, in this paper, we investigate the generalization power of each interactive concept, and we use the generalization power of different interactive concepts to explain the generalization power of the entire DNN. Specifically, we define the complexity of each interactive concept. We find that simple concepts can be better generalized to testing data than complex concepts. The DNN with strong generalization power usually learns simple concepts more quickly and encodes fewer complex concepts. More crucially, we discover the detouring dynamics of learning complex concepts, which explain both the high learning difficulty and the low generalization power of complex concepts.

Via

Access Paper or Ask Questions

Bayesian Neural Networks Tend to Ignore Complex and Sensitive Concepts

Feb 25, 2023

Qihan Ren, Huiqi Deng, Yunuo Chen, Siyu Lou, Quanshi Zhang

Figure 1 for Bayesian Neural Networks Tend to Ignore Complex and Sensitive Concepts

Figure 2 for Bayesian Neural Networks Tend to Ignore Complex and Sensitive Concepts

Figure 3 for Bayesian Neural Networks Tend to Ignore Complex and Sensitive Concepts

Figure 4 for Bayesian Neural Networks Tend to Ignore Complex and Sensitive Concepts

Abstract:In this paper, we focus on mean-field variational Bayesian Neural Networks (BNNs) and explore the representation capacity of such BNNs by investigating which types of concepts are less likely to be encoded by the BNN. It has been observed and studied that a relatively small set of interactive concepts usually emerge in the knowledge representation of a sufficiently-trained neural network, and such concepts can faithfully explain the network output. Based on this, our study proves that compared to standard deep neural networks (DNNs), it is less likely for BNNs to encode complex concepts. Experiments verify our theoretical proofs. Note that the tendency to encode less complex concepts does not necessarily imply weak representation power, considering that complex concepts exhibit low generalization power and high adversarial vulnerability.

Via

Access Paper or Ask Questions

Towards Axiomatic, Hierarchical, and Symbolic Explanation for Deep Models

Nov 30, 2021

Jie Ren, Mingjie Li, Qirui Chen, Huiqi Deng, Quanshi Zhang

Figure 1 for Towards Axiomatic, Hierarchical, and Symbolic Explanation for Deep Models

Figure 2 for Towards Axiomatic, Hierarchical, and Symbolic Explanation for Deep Models

Figure 3 for Towards Axiomatic, Hierarchical, and Symbolic Explanation for Deep Models

Figure 4 for Towards Axiomatic, Hierarchical, and Symbolic Explanation for Deep Models

Abstract:This paper proposes a hierarchical and symbolic And-Or graph (AOG) to objectively explain the internal logic encoded by a well-trained deep model for inference. We first define the objectiveness of an explainer model in game theory, and we develop a rigorous representation of the And-Or logic encoded by the deep model. The objectiveness and trustworthiness of the AOG explainer are both theoretically guaranteed and experimentally verified. Furthermore, we propose several techniques to boost the conciseness of the explanation.

Via

Access Paper or Ask Questions

Discovering and Explaining the Representation Bottleneck of DNNs

Nov 18, 2021

Huiqi Deng, Qihan Ren, Xu Chen, Hao Zhang, Jie Ren, Quanshi Zhang

Figure 1 for Discovering and Explaining the Representation Bottleneck of DNNs

Figure 2 for Discovering and Explaining the Representation Bottleneck of DNNs

Figure 3 for Discovering and Explaining the Representation Bottleneck of DNNs

Figure 4 for Discovering and Explaining the Representation Bottleneck of DNNs

Abstract:This paper explores the bottleneck of feature representations of deep neural networks (DNNs), from the perspective of the complexity of interactions between input variables encoded in DNNs. To this end, we focus on the multi-order interaction between input variables, where the order represents the complexity of interactions. We discover that a DNN is more likely to encode both too simple interactions and too complex interactions, but usually fails to learn interactions of intermediate complexity. Such a phenomenon is widely shared by different DNNs for different tasks. This phenomenon indicates a cognition gap between DNNs and human beings, and we call it a representation bottleneck. We theoretically prove the underlying reason for the representation bottleneck. Furthermore, we propose a loss to encourage/penalize the learning of interactions of specific complexities, and analyze the representation capacities of interactions of different complexities.

Via

Access Paper or Ask Questions

A General Taylor Framework for Unifying and Revisiting Attribution Methods

May 28, 2021

Huiqi Deng, Na Zou, Mengnan Du, Weifu Chen, Guocan Feng, Xia Hu

Figure 1 for A General Taylor Framework for Unifying and Revisiting Attribution Methods

Figure 2 for A General Taylor Framework for Unifying and Revisiting Attribution Methods

Figure 3 for A General Taylor Framework for Unifying and Revisiting Attribution Methods

Figure 4 for A General Taylor Framework for Unifying and Revisiting Attribution Methods

Abstract:Attribution methods provide an insight into the decision-making process of machine learning models, especially deep neural networks, by assigning contribution scores to each individual feature. However, the attribution problem has not been well-defined, which lacks a unified guideline to the contribution assignment process. Furthermore, existing attribution methods often built upon various empirical intuitions and heuristics. There still lacks a general theoretical framework that not only can offer a good description of the attribution problem, but also can be applied to unifying and revisiting existing attribution methods. To bridge the gap, in this paper, we propose a Taylor attribution framework, which models the attribution problem as how to decide individual payoffs in a coalition. Then, we reformulate fourteen mainstream attribution methods into the Taylor framework and analyze these attribution methods in terms of rationale, fidelity, and limitation in the framework. Moreover, we establish three principles for a good attribution in the Taylor attribution framework, i.e., low approximation error, correct Taylor contribution assignment, and unbiased baseline selection. Finally, we empirically validate the Taylor reformulations and reveal a positive correlation between the attribution performance and the number of principles followed by the attribution method via benchmarking on real-world datasets.

Via

Access Paper or Ask Questions