Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Quanshi Zhang

A Roadmap for Big Model

Apr 02, 2022

Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He(+90 more)

Abstract:With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.

* arXiv admin note: text overlap with arXiv:2107.06499 by other authors

Via

Access Paper or Ask Questions

Trap of Feature Diversity in the Learning of MLPs

Dec 02, 2021

Dongrui Liu, Shaobo Wang, Jie Ren, Kangrui Wang, Sheng Yin, Quanshi Zhang

Figure 1 for Trap of Feature Diversity in the Learning of MLPs

Figure 2 for Trap of Feature Diversity in the Learning of MLPs

Figure 3 for Trap of Feature Diversity in the Learning of MLPs

Figure 4 for Trap of Feature Diversity in the Learning of MLPs

Abstract:In this paper, we discover a two-phase phenomenon in the learning of multi-layer perceptrons (MLPs). I.e., in the first phase, the training loss does not decrease significantly, but the similarity of features between different samples keeps increasing, which hurts the feature diversity. We explain such a two-phase phenomenon in terms of the learning dynamics of the MLP. Furthermore, we propose two normalization operations to eliminate the two-phase phenomenon, which avoids the decrease of the feature diversity and speeds up the training process.

Via

Access Paper or Ask Questions

Towards Axiomatic, Hierarchical, and Symbolic Explanation for Deep Models

Nov 30, 2021

Jie Ren, Mingjie Li, Qirui Chen, Huiqi Deng, Quanshi Zhang

Figure 1 for Towards Axiomatic, Hierarchical, and Symbolic Explanation for Deep Models

Figure 2 for Towards Axiomatic, Hierarchical, and Symbolic Explanation for Deep Models

Figure 3 for Towards Axiomatic, Hierarchical, and Symbolic Explanation for Deep Models

Figure 4 for Towards Axiomatic, Hierarchical, and Symbolic Explanation for Deep Models

Abstract:This paper proposes a hierarchical and symbolic And-Or graph (AOG) to objectively explain the internal logic encoded by a well-trained deep model for inference. We first define the objectiveness of an explainer model in game theory, and we develop a rigorous representation of the And-Or logic encoded by the deep model. The objectiveness and trustworthiness of the AOG explainer are both theoretically guaranteed and experimentally verified. Furthermore, we propose several techniques to boost the conciseness of the explanation.

Via

Access Paper or Ask Questions

Discovering and Explaining the Representation Bottleneck of DNNs

Nov 18, 2021

Huiqi Deng, Qihan Ren, Xu Chen, Hao Zhang, Jie Ren, Quanshi Zhang

Figure 1 for Discovering and Explaining the Representation Bottleneck of DNNs

Figure 2 for Discovering and Explaining the Representation Bottleneck of DNNs

Figure 3 for Discovering and Explaining the Representation Bottleneck of DNNs

Figure 4 for Discovering and Explaining the Representation Bottleneck of DNNs

Abstract:This paper explores the bottleneck of feature representations of deep neural networks (DNNs), from the perspective of the complexity of interactions between input variables encoded in DNNs. To this end, we focus on the multi-order interaction between input variables, where the order represents the complexity of interactions. We discover that a DNN is more likely to encode both too simple interactions and too complex interactions, but usually fails to learn interactions of intermediate complexity. Such a phenomenon is widely shared by different DNNs for different tasks. This phenomenon indicates a cognition gap between DNNs and human beings, and we call it a representation bottleneck. We theoretically prove the underlying reason for the representation bottleneck. Furthermore, we propose a loss to encourage/penalize the learning of interactions of specific complexities, and analyze the representation capacities of interactions of different complexities.

Via

Access Paper or Ask Questions

A Unified Game-Theoretic Interpretation of Adversarial Robustness

Nov 08, 2021

Jie Ren, Die Zhang, Yisen Wang, Lu Chen, Zhanpeng Zhou, Yiting Chen, Xu Cheng, Xin Wang, Meng Zhou, Jie Shi(+1 more)

Figure 1 for A Unified Game-Theoretic Interpretation of Adversarial Robustness

Figure 2 for A Unified Game-Theoretic Interpretation of Adversarial Robustness

Figure 3 for A Unified Game-Theoretic Interpretation of Adversarial Robustness

Figure 4 for A Unified Game-Theoretic Interpretation of Adversarial Robustness

Abstract:This paper provides a unified view to explain different adversarial attacks and defense methods, \emph{i.e.} the view of multi-order interactions between input variables of DNNs. Based on the multi-order interaction, we discover that adversarial attacks mainly affect high-order interactions to fool the DNN. Furthermore, we find that the robustness of adversarially trained DNNs comes from category-specific low-order interactions. Our findings provide a potential method to unify adversarial perturbations and robustness, which can explain the existing defense methods in a principle way. Besides, our findings also make a revision of previous inaccurate understanding of the shape bias of adversarially learned features.

* the previous version is arXiv:2103.07364, but I mistakenly apply a new ID for the paper

Via

Access Paper or Ask Questions

Interpreting Representation Quality of DNNs for 3D Point Cloud Processing

Nov 05, 2021

Wen Shen, Qihan Ren, Dongrui Liu, Quanshi Zhang

Figure 1 for Interpreting Representation Quality of DNNs for 3D Point Cloud Processing

Figure 2 for Interpreting Representation Quality of DNNs for 3D Point Cloud Processing

Figure 3 for Interpreting Representation Quality of DNNs for 3D Point Cloud Processing

Figure 4 for Interpreting Representation Quality of DNNs for 3D Point Cloud Processing

Abstract:In this paper, we evaluate the quality of knowledge representations encoded in deep neural networks (DNNs) for 3D point cloud processing. We propose a method to disentangle the overall model vulnerability into the sensitivity to the rotation, the translation, the scale, and local 3D structures. Besides, we also propose metrics to evaluate the spatial smoothness of encoding 3D structures, and the representation complexity of the DNN. Based on such analysis, experiments expose representation problems with classic DNNs, and explain the utility of the adversarial training.

Via

Access Paper or Ask Questions

Visualizing the Emergence of Intermediate Visual Patterns in DNNs

Nov 05, 2021

Mingjie Li, Shaobo Wang, Quanshi Zhang

Figure 1 for Visualizing the Emergence of Intermediate Visual Patterns in DNNs

Figure 2 for Visualizing the Emergence of Intermediate Visual Patterns in DNNs

Figure 3 for Visualizing the Emergence of Intermediate Visual Patterns in DNNs

Figure 4 for Visualizing the Emergence of Intermediate Visual Patterns in DNNs

Abstract:This paper proposes a method to visualize the discrimination power of intermediate-layer visual patterns encoded by a DNN. Specifically, we visualize (1) how the DNN gradually learns regional visual patterns in each intermediate layer during the training process, and (2) the effects of the DNN using non-discriminative patterns in low layers to construct disciminative patterns in middle/high layers through the forward propagation. Based on our visualization method, we can quantify knowledge points (i.e., the number of discriminative visual patterns) learned by the DNN to evaluate the representation capacity of the DNN. Furthermore, this method also provides new insights into signal-processing behaviors of existing deep-learning techniques, such as adversarial attacks and knowledge distillation.

Via

Access Paper or Ask Questions

Rapid detection and recognition of whole brain activity in a freely behaving Caenorhabditis elegans

Sep 23, 2021

Yuxiang Wu, Shang Wu, Xin Wang, Chengtian Lang, Quanshi Zhang, Quan Wen, Tianqi Xu

Figure 1 for Rapid detection and recognition of whole brain activity in a freely behaving Caenorhabditis elegans

Figure 2 for Rapid detection and recognition of whole brain activity in a freely behaving Caenorhabditis elegans

Figure 3 for Rapid detection and recognition of whole brain activity in a freely behaving Caenorhabditis elegans

Figure 4 for Rapid detection and recognition of whole brain activity in a freely behaving Caenorhabditis elegans

Abstract:Advanced volumetric imaging methods and genetically encoded activity indicators have permitted a comprehensive characterization of whole brain activity at single neuron resolution in \textit{Caenorhabditis elegans}. The constant motion and deformation of the mollusc nervous system, however, impose a great challenge for a consistent identification of densely packed neurons in a behaving animal. Here, we propose a cascade solution for long-term and rapid recognition of head ganglion neurons in a freely moving \textit{C. elegans}. First, potential neuronal regions from a stack of fluorescence images are detected by a deep learning algorithm. Second, 2 dimensional neuronal regions are fused into 3 dimensional neuron entities. Third, by exploiting the neuronal density distribution surrounding a neuron and relative positional information between neurons, a multi-class artificial neural network transforms engineered neuronal feature vectors into digital neuronal identities. Under the constraint of a small number (20-40 volumes) of training samples, our bottom-up approach is able to process each volume - $1024 \times 1024 \times 18$ in voxels - in less than 1 second and achieves an accuracy of $91\%$ in neuronal detection and $74\%$ in neuronal recognition. Our work represents an important development towards a rapid and fully automated algorithm for decoding whole brain activity underlying natural animal behaviors.

Via

Access Paper or Ask Questions

Interpreting Attributions and Interactions of Adversarial Attacks

Aug 16, 2021

Xin Wang, Shuyun Lin, Hao Zhang, Yufei Zhu, Quanshi Zhang

Figure 1 for Interpreting Attributions and Interactions of Adversarial Attacks

Figure 2 for Interpreting Attributions and Interactions of Adversarial Attacks

Figure 3 for Interpreting Attributions and Interactions of Adversarial Attacks

Figure 4 for Interpreting Attributions and Interactions of Adversarial Attacks

Abstract:This paper aims to explain adversarial attacks in terms of how adversarial perturbations contribute to the attacking task. We estimate attributions of different image regions to the decrease of the attacking cost based on the Shapley value. We define and quantify interactions among adversarial perturbation pixels, and decompose the entire perturbation map into relatively independent perturbation components. The decomposition of the perturbation map shows that adversarially-trained DNNs have more perturbation components in the foreground than normally-trained DNNs. Moreover, compared to the normally-trained DNN, the adversarially-trained DNN have more components which mainly decrease the score of the true category. Above analyses provide new insights into the understanding of adversarial attacks.

Via

Access Paper or Ask Questions

A Hypothesis for the Aesthetic Appreciation in Neural Networks

Jul 31, 2021

Xu Cheng, Xin Wang, Haotian Xue, Zhengyang Liang, Quanshi Zhang

Figure 1 for A Hypothesis for the Aesthetic Appreciation in Neural Networks

Figure 2 for A Hypothesis for the Aesthetic Appreciation in Neural Networks

Figure 3 for A Hypothesis for the Aesthetic Appreciation in Neural Networks

Figure 4 for A Hypothesis for the Aesthetic Appreciation in Neural Networks

Abstract:This paper proposes a hypothesis for the aesthetic appreciation that aesthetic images make a neural network strengthen salient concepts and discard inessential concepts. In order to verify this hypothesis, we use multi-variate interactions to represent salient concepts and inessential concepts contained in images. Furthermore, we design a set of operations to revise images towards more beautiful ones. In experiments, we find that the revised images are more aesthetic than the original ones to some extent.

Via

Access Paper or Ask Questions