Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Hu

MDCS: More Diverse Experts with Consistency Self-distillation for Long-tailed Recognition

Aug 19, 2023
Qihao Zhao, Chen Jiang, Wei Hu, Fan Zhang, Jun Liu

Figure 1 for MDCS: More Diverse Experts with Consistency Self-distillation for Long-tailed Recognition

Figure 2 for MDCS: More Diverse Experts with Consistency Self-distillation for Long-tailed Recognition

Figure 3 for MDCS: More Diverse Experts with Consistency Self-distillation for Long-tailed Recognition

Figure 4 for MDCS: More Diverse Experts with Consistency Self-distillation for Long-tailed Recognition

Recently, multi-expert methods have led to significant improvements in long-tail recognition (LTR). We summarize two aspects that need further enhancement to contribute to LTR boosting: (1) More diverse experts; (2) Lower model variance. However, the previous methods didn't handle them well. To this end, we propose More Diverse experts with Consistency Self-distillation (MDCS) to bridge the gap left by earlier methods. Our MDCS approach consists of two core components: Diversity Loss (DL) and Consistency Self-distillation (CS). In detail, DL promotes diversity among experts by controlling their focus on different categories. To reduce the model variance, we employ KL divergence to distill the richer knowledge of weakly augmented instances for the experts' self-distillation. In particular, we design Confident Instance Sampling (CIS) to select the correctly classified instances for CS to avoid biased/noisy knowledge. In the analysis and ablation study, we demonstrate that our method compared with previous work can effectively increase the diversity of experts, significantly reduce the variance of the model, and improve recognition accuracy. Moreover, the roles of our DL and CS are mutually reinforcing and coupled: the diversity of experts benefits from the CS, and the CS cannot achieve remarkable results without the DL. Experiments show our MDCS outperforms the state-of-the-art by 1% $\sim$ 2% on five popular long-tailed benchmarks, including CIFAR10-LT, CIFAR100-LT, ImageNet-LT, Places-LT, and iNaturalist 2018. The code is available at https://github.com/fistyee/MDCS.

* ICCV2023 Accept. 13 pages

Via

Access Paper or Ask Questions

3DHacker: Spectrum-based Decision Boundary Generation for Hard-label 3D Point Cloud Attack

Aug 15, 2023
Yunbo Tao, Daizong Liu, Pan Zhou, Yulai Xie, Wei Du, Wei Hu

Figure 1 for 3DHacker: Spectrum-based Decision Boundary Generation for Hard-label 3D Point Cloud Attack

Figure 2 for 3DHacker: Spectrum-based Decision Boundary Generation for Hard-label 3D Point Cloud Attack

Figure 3 for 3DHacker: Spectrum-based Decision Boundary Generation for Hard-label 3D Point Cloud Attack

Figure 4 for 3DHacker: Spectrum-based Decision Boundary Generation for Hard-label 3D Point Cloud Attack

With the maturity of depth sensors, the vulnerability of 3D point cloud models has received increasing attention in various applications such as autonomous driving and robot navigation. Previous 3D adversarial attackers either follow the white-box setting to iteratively update the coordinate perturbations based on gradients, or utilize the output model logits to estimate noisy gradients in the black-box setting. However, these attack methods are hard to be deployed in real-world scenarios since realistic 3D applications will not share any model details to users. Therefore, we explore a more challenging yet practical 3D attack setting, \textit{i.e.}, attacking point clouds with black-box hard labels, in which the attacker can only have access to the prediction label of the input. To tackle this setting, we propose a novel 3D attack method, termed \textbf{3D} \textbf{H}ard-label att\textbf{acker} (\textbf{3DHacker}), based on the developed decision boundary algorithm to generate adversarial samples solely with the knowledge of class labels. Specifically, to construct the class-aware model decision boundary, 3DHacker first randomly fuses two point clouds of different classes in the spectral domain to craft their intermediate sample with high imperceptibility, then projects it onto the decision boundary via binary search. To restrict the final perturbation size, 3DHacker further introduces an iterative optimization strategy to move the intermediate sample along the decision boundary for generating adversarial point clouds with smallest trivial perturbations. Extensive evaluations show that, even in the challenging hard-label setting, 3DHacker still competitively outperforms existing 3D attacks regarding the attack performance as well as adversary quality.

* Accepted by ICCV 2023

Via

Access Paper or Ask Questions

Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity

Jul 17, 2023
Zhanpeng Zhou, Yongyi Yang, Xiaojiang Yang, Junchi Yan, Wei Hu

Figure 1 for Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity

Figure 2 for Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity

Figure 3 for Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity

Figure 4 for Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity

Recent work has revealed many intriguing empirical phenomena in neural network training, despite the poorly understood and highly complex loss landscapes and training dynamics. One of these phenomena, Linear Mode Connectivity (LMC), has gained considerable attention due to the intriguing observation that different solutions can be connected by a linear path in the parameter space while maintaining near-constant training and test losses. In this work, we introduce a stronger notion of linear connectivity, Layerwise Linear Feature Connectivity (LLFC), which says that the feature maps of every layer in different trained networks are also linearly connected. We provide comprehensive empirical evidence for LLFC across a wide range of settings, demonstrating that whenever two trained networks satisfy LMC (via either spawning or permutation methods), they also satisfy LLFC in nearly all the layers. Furthermore, we delve deeper into the underlying factors contributing to LLFC, which reveal new insights into the spawning and permutation approaches. The study of LLFC transcends and advances our understanding of LMC by adopting a feature-learning perspective.

* 25 pages, 23 figures

Via

Access Paper or Ask Questions

IR Design for Application-Specific Natural Language: A Case Study on Traffic Data

Jul 13, 2023
Wei Hu, Xuhong Wang, Ding Wang, Shengyue Yao, Zuqiu Mao, Li Li, Fei-Yue Wang, Yilun Lin

Figure 1 for IR Design for Application-Specific Natural Language: A Case Study on Traffic Data

Figure 2 for IR Design for Application-Specific Natural Language: A Case Study on Traffic Data

Figure 3 for IR Design for Application-Specific Natural Language: A Case Study on Traffic Data

Figure 4 for IR Design for Application-Specific Natural Language: A Case Study on Traffic Data

In the realm of software applications in the transportation industry, Domain-Specific Languages (DSLs) have enjoyed widespread adoption due to their ease of use and various other benefits. With the ceaseless progress in computer performance and the rapid development of large-scale models, the possibility of programming using natural language in specified applications - referred to as Application-Specific Natural Language (ASNL) - has emerged. ASNL exhibits greater flexibility and freedom, which, in turn, leads to an increase in computational complexity for parsing and a decrease in processing performance. To tackle this issue, our paper advances a design for an intermediate representation (IR) that caters to ASNL and can uniformly process transportation data into graph data format, improving data processing performance. Experimental comparisons reveal that in standard data query operations, our proposed IR design can achieve a speed improvement of over forty times compared to direct usage of standard XML format data.

Via

Access Paper or Ask Questions

Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations

Jun 29, 2023
Yongyi Yang, Jacob Steinhardt, Wei Hu

Figure 1 for Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations

Figure 2 for Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations

Figure 3 for Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations

Figure 4 for Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations

Recent work has observed an intriguing ''Neural Collapse'' phenomenon in well-trained neural networks, where the last-layer representations of training samples with the same label collapse into each other. This appears to suggest that the last-layer representations are completely determined by the labels, and do not depend on the intrinsic structure of input distribution. We provide evidence that this is not a complete description, and that the apparent collapse hides important fine-grained structure in the representations. Specifically, even when representations apparently collapse, the small amount of remaining variation can still faithfully and accurately captures the intrinsic structure of input distribution. As an example, if we train on CIFAR-10 using only 5 coarse-grained labels (by combining two classes into one super-class) until convergence, we can reconstruct the original 10-class labels from the learned representations via unsupervised clustering. The reconstructed labels achieve $93\%$ accuracy on the CIFAR-10 test set, nearly matching the normal CIFAR-10 accuracy for the same architecture. We also provide an initial theoretical result showing the fine-grained representation structure in a simplified synthetic setting. Our results show concretely how the structure of input data can play a significant role in determining the fine-grained structure of neural representations, going beyond what Neural Collapse predicts.

* This paper has been accepted as a conference paper at ICML 2023

Via

Access Paper or Ask Questions

Joint Pre-training and Local Re-training: Transferable Representation Learning on Multi-source Knowledge Graphs

Jun 05, 2023
Zequn Sun, Jiacheng Huang, Jinghao Lin, Xiaozhou Xu, Qijin Chen, Wei Hu

Figure 1 for Joint Pre-training and Local Re-training: Transferable Representation Learning on Multi-source Knowledge Graphs

Figure 2 for Joint Pre-training and Local Re-training: Transferable Representation Learning on Multi-source Knowledge Graphs

Figure 3 for Joint Pre-training and Local Re-training: Transferable Representation Learning on Multi-source Knowledge Graphs

Figure 4 for Joint Pre-training and Local Re-training: Transferable Representation Learning on Multi-source Knowledge Graphs

In this paper, we present the ``joint pre-training and local re-training'' framework for learning and applying multi-source knowledge graph (KG) embeddings. We are motivated by the fact that different KGs contain complementary information to improve KG embeddings and downstream tasks. We pre-train a large teacher KG embedding model over linked multi-source KGs and distill knowledge to train a student model for a task-specific KG. To enable knowledge transfer across different KGs, we use entity alignment to build a linked subgraph for connecting the pre-trained KGs and the target KG. The linked subgraph is re-trained for three-level knowledge distillation from the teacher to the student, i.e., feature knowledge distillation, network knowledge distillation, and prediction knowledge distillation, to generate more expressive embeddings. The teacher model can be reused for different target KGs and tasks without having to train from scratch. We conduct extensive experiments to demonstrate the effectiveness and efficiency of our framework.

* Accepted in the 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2023)

Via

Access Paper or Ask Questions

What Makes Entities Similar? A Similarity Flooding Perspective for Multi-sourced Knowledge Graph Embeddings

Jun 05, 2023
Zequn Sun, Jiacheng Huang, Xiaozhou Xu, Qijin Chen, Weijun Ren, Wei Hu

Figure 1 for What Makes Entities Similar? A Similarity Flooding Perspective for Multi-sourced Knowledge Graph Embeddings

Figure 2 for What Makes Entities Similar? A Similarity Flooding Perspective for Multi-sourced Knowledge Graph Embeddings

Figure 3 for What Makes Entities Similar? A Similarity Flooding Perspective for Multi-sourced Knowledge Graph Embeddings

Figure 4 for What Makes Entities Similar? A Similarity Flooding Perspective for Multi-sourced Knowledge Graph Embeddings

Joint representation learning over multi-sourced knowledge graphs (KGs) yields transferable and expressive embeddings that improve downstream tasks. Entity alignment (EA) is a critical step in this process. Despite recent considerable research progress in embedding-based EA, how it works remains to be explored. In this paper, we provide a similarity flooding perspective to explain existing translation-based and aggregation-based EA models. We prove that the embedding learning process of these models actually seeks a fixpoint of pairwise similarities between entities. We also provide experimental evidence to support our theoretical analysis. We propose two simple but effective methods inspired by the fixpoint computation in similarity flooding, and demonstrate their effectiveness on benchmark datasets. Our work bridges the gap between recent embedding-based models and the conventional similarity flooding algorithm. It would improve our understanding of and increase our faith in embedding-based EA.

* Accepted in the 40th International Conference on Machine Learning (ICML 2023)

Via

Access Paper or Ask Questions

The Law of Parsimony in Gradient Descent for Learning Deep Linear Networks

Jun 01, 2023
Can Yaras, Peng Wang, Wei Hu, Zhihui Zhu, Laura Balzano, Qing Qu

Figure 1 for The Law of Parsimony in Gradient Descent for Learning Deep Linear Networks

Figure 2 for The Law of Parsimony in Gradient Descent for Learning Deep Linear Networks

Figure 3 for The Law of Parsimony in Gradient Descent for Learning Deep Linear Networks

Figure 4 for The Law of Parsimony in Gradient Descent for Learning Deep Linear Networks

Over the past few years, an extensively studied phenomenon in training deep networks is the implicit bias of gradient descent towards parsimonious solutions. In this work, we investigate this phenomenon by narrowing our focus to deep linear networks. Through our analysis, we reveal a surprising "law of parsimony" in the learning dynamics when the data possesses low-dimensional structures. Specifically, we show that the evolution of gradient descent starting from orthogonal initialization only affects a minimal portion of singular vector spaces across all weight matrices. In other words, the learning process happens only within a small invariant subspace of each weight matrix, despite the fact that all weight parameters are updated throughout training. This simplicity in learning dynamics could have significant implications for both efficient training and a better understanding of deep networks. First, the analysis enables us to considerably improve training efficiency by taking advantage of the low-dimensional structure in learning dynamics. We can construct smaller, equivalent deep linear networks without sacrificing the benefits associated with the wider counterparts. Second, it allows us to better understand deep representation learning by elucidating the linear progressive separation and concentration of representations from shallow to deep layers. We also conduct numerical experiments to support our theoretical results. The code for our experiments can be found at https://github.com/cjyaras/lawofparsimony.

* The first two authors contributed to this work equally; 32 pages, 12 figures

Via

Access Paper or Ask Questions

Robust Sparse Mean Estimation via Incremental Learning

May 24, 2023
Jianhao Ma, Rui Ray Chen, Yinghui He, Salar Fattahi, Wei Hu

Figure 1 for Robust Sparse Mean Estimation via Incremental Learning

Figure 2 for Robust Sparse Mean Estimation via Incremental Learning

Figure 3 for Robust Sparse Mean Estimation via Incremental Learning

Figure 4 for Robust Sparse Mean Estimation via Incremental Learning

In this paper, we study the problem of robust sparse mean estimation, where the goal is to estimate a $k$-sparse mean from a collection of partially corrupted samples drawn from a heavy-tailed distribution. Existing estimators face two critical challenges in this setting. First, they are limited by a conjectured computational-statistical tradeoff, implying that any computationally efficient algorithm needs $\tilde\Omega(k^2)$ samples, while its statistically-optimal counterpart only requires $\tilde O(k)$ samples. Second, the existing estimators fall short of practical use as they scale poorly with the ambient dimension. This paper presents a simple mean estimator that overcomes both challenges under moderate conditions: it runs in near-linear time and memory (both with respect to the ambient dimension) while requiring only $\tilde O(k)$ samples to recover the true mean. At the core of our method lies an incremental learning phenomenon: we introduce a simple nonconvex framework that can incrementally learn the top-$k$ nonzero elements of the mean while keeping the zero elements arbitrarily small. Unlike existing estimators, our method does not need any prior knowledge of the sparsity level $k$. We prove the optimality of our estimator by providing a matching information-theoretic lower bound. Finally, we conduct a series of simulations to corroborate our theoretical findings. Our code is available at https://github.com/huihui0902/Robust_mean_estimation.

Via

Access Paper or Ask Questions

Using a Bayesian-Inference Approach to Calibrating Models for Simulation in Robotics

May 11, 2023
Huzaifa Mustafa Unjhawala, Ruochun Zhang, Wei Hu, Jinlong Wu, Radu Serban, Dan Negrut

Figure 1 for Using a Bayesian-Inference Approach to Calibrating Models for Simulation in Robotics

Figure 2 for Using a Bayesian-Inference Approach to Calibrating Models for Simulation in Robotics

Figure 3 for Using a Bayesian-Inference Approach to Calibrating Models for Simulation in Robotics

Figure 4 for Using a Bayesian-Inference Approach to Calibrating Models for Simulation in Robotics

In robotics, simulation has the potential to reduce design time and costs, and lead to a more robust engineered solution and a safer development process. However, the use of simulators is predicated on the availability of good models. This contribution is concerned with improving the quality of these models via calibration, which is cast herein in a Bayesian framework. First, we discuss the Bayesian machinery involved in model calibration. Then, we demonstrate it in one example: calibration of a vehicle dynamics model that has low degree of freedom count and can be used for state estimation, model predictive control, or path planning. A high fidelity simulator is used to emulate the ``experiments'' and generate the data for the calibration. The merit of this work is not tied to a new Bayesian methodology for calibration, but to the demonstration of how the Bayesian machinery can establish connections among models in computational dynamics, even when the data in use is noisy. The software used to generate the results reported herein is available in a public repository for unfettered use and distribution.

* 061004-18 / Vol. 18, JUNE 2023
* 19 pages, 42 figures

Via

Access Paper or Ask Questions