Abstract:We present Feature-Align CNN (FA-CNN), a prototype CNN architecture with intrinsic class attribution through end-to-end feature alignment. Our intuition is that the use of unordered operations such as Linear and Conv2D layers cause unnecessary shuffling and mixing of semantic concepts, thereby making raw feature maps difficult to understand. We introduce two new order preserving layers, the dampened skip connection, and the global average pooling classifier head. These layers force the model to maintain an end-to-end feature alignment from the raw input pixels all the way to final class logits. This end-to-end alignment enhances the interpretability of the model by allowing the raw feature maps to intrinsically exhibit class attribution. We prove theoretically that FA-CNN penultimate feature maps are identical to Grad-CAM saliency maps. Moreover, we prove that these feature maps slowly morph layer-by-layer over network depth, showing the evolution of features through network depth toward penultimate class activations. FA-CNN performs well on benchmark image classification datasets. Moreover, we compare the averaged FA-CNN raw feature maps against Grad-CAM and permutation methods in a percent pixels removed interpretability task. We conclude this work with a discussion and future, including limitations and extensions toward hybrid models.




Abstract:We present a novel empirical approach toward measuring the Probability Density Function (PDF) of the deep features of Convolutional Neural Networks (CNNs). Measurement of the deep feature PDF is a valuable problem for several reasons. Notably, a. Understanding the deep feature PDF yields new insight into deep representations. b. Feature density methods are important for tasks such as anomaly detection which can improve the robustness of deep learning models in the wild. Interpretable measurement of the deep feature PDF is challenging due to the Curse of Dimensionality (CoD), and the Spatial intuition Limitation. Our novel measurement technique combines copula analysis with the Method of Orthogonal Moments (MOM), in order to directly measure the Generalized Characteristic Function (GCF) of the multivariate deep feature PDF. We find that, surprisingly, the one-dimensional marginals of non-negative deep CNN features after major blocks are not well approximated by a Gaussian distribution, and that these features increasingly approximate an exponential distribution with increasing network depth. Furthermore, we observe that deep features become increasingly independent with increasing network depth within their typical ranges. However, we surprisingly also observe that many deep features exhibit strong dependence (either correlation or anti-correlation) with other extremely strong detections, even if these features are independent within typical ranges. We elaborate on these findings in our discussion, where we propose a new hypothesis that exponentially infrequent large valued features correspond to strong computer vision detections of semantic targets, which would imply that these large-valued features are not outliers but rather an important detection signal.
Abstract:Discriminative deep learning models with a linear+softmax final layer have a problem: the latent space only predicts the conditional probabilities $p(Y|X)$ but not the full joint distribution $p(Y,X)$, which necessitates a generative approach. The conditional probability cannot detect outliers, causing outlier sensitivity in softmax networks. This exacerbates model over-confidence impacting many problems, such as hallucinations, confounding biases, and dependence on large datasets. To address this we introduce a novel embedding constraint based on the Method of Moments (MoM). We investigate the use of polynomial moments ranging from 1st through 4th order hyper-covariance matrices. Furthermore, we use this embedding constraint to train an Axis-Aligned Gaussian Mixture Model (AAGMM) final layer, which learns not only the conditional, but also the joint distribution of the latent space. We apply this method to the domain of semi-supervised image classification by extending FlexMatch with our technique. We find our MoM constraint with the AAGMM layer is able to match the reported FlexMatch accuracy, while also modeling the joint distribution, thereby reducing outlier sensitivity. We also present a preliminary outlier detection strategy based on Mahalanobis distance and discuss future improvements to this strategy. Code is available at: \url{https://github.com/mmajurski/ssl-gmm}