Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Linghao Kong

Expand Neurons, Not Parameters

Oct 06, 2025

Linghao Kong, Inimai Subramanian, Yonadav Shavit, Micah Adler, Dan Alistarh, Nir Shavit

Figure 1 for Expand Neurons, Not Parameters

Figure 2 for Expand Neurons, Not Parameters

Figure 3 for Expand Neurons, Not Parameters

Figure 4 for Expand Neurons, Not Parameters

Abstract:This work demonstrates how increasing the number of neurons in a network without increasing its number of non-zero parameters improves performance. We show that this gain corresponds with a decrease in interference between multiple features that would otherwise share the same neurons. To reduce such entanglement at a fixed non-zero parameter count, we introduce Fixed Parameter Expansion (FPE): replace a neuron with multiple children and partition the parent's weights disjointly across them, so that each child inherits a non-overlapping subset of connections. On symbolic tasks, specifically Boolean code problems, clause-aligned FPE systematically reduces polysemanticity metrics and yields higher task accuracy. Notably, random splits of neuron weights approximate these gains, indicating that reduced collisions, not precise assignment, are a primary driver. Consistent with the superposition hypothesis, the benefits of FPE grow with increasing interference: when polysemantic load is high, accuracy improvements are the largest. Transferring these insights to real models (classifiers over CLIP embeddings and deeper multilayer networks) we find that widening networks while maintaining a constant non-zero parameter count consistently increases accuracy. These results identify an interpretability-grounded mechanism to leverage width against superposition, improving performance without increasing the number of non-zero parameters. Such a direction is well matched to modern accelerators, where memory movement of non-zero parameters, rather than raw compute, is the dominant bottleneck.

* 10 pages, 6 figures

Via

Access Paper or Ask Questions

Sparse Expansion and Neuronal Disentanglement

May 24, 2024

Shashata Sawmya, Linghao Kong, Ilia Markov, Dan Alistarh, Nir Shavit

Figure 1 for Sparse Expansion and Neuronal Disentanglement

Figure 2 for Sparse Expansion and Neuronal Disentanglement

Figure 3 for Sparse Expansion and Neuronal Disentanglement

Figure 4 for Sparse Expansion and Neuronal Disentanglement

Abstract:We show how to improve the inference efficiency of an LLM by expanding it into a mixture of sparse experts, where each expert is a copy of the original weights, one-shot pruned for a specific cluster of input values. We call this approach $\textit{Sparse Expansion}$. We show that, for models such as Llama 2 70B, as we increase the number of sparse experts, Sparse Expansion outperforms all other one-shot sparsification approaches for the same inference FLOP budget per token, and that this gap grows as sparsity increases, leading to inference speedups. But why? To answer this, we provide strong evidence that the mixture of sparse experts is effectively $\textit{disentangling}$ the input-output relationship of every individual neuron across clusters of inputs. Specifically, sparse experts approximate the dense neuron output distribution with fewer weights by decomposing the distribution into a collection of simpler ones, each with a separate sparse dot product covering it. Interestingly, we show that the Wasserstein distance between a neuron's output distribution and a Gaussian distribution is an indicator of its entanglement level and contribution to the accuracy of the model. Every layer of an LLM has a fraction of highly entangled Wasserstein neurons, and model performance suffers more when these are sparsified as opposed to others.

* 9 pages, 8 figures, Submitted to NeurIPS 2024 main track

Via

Access Paper or Ask Questions

Defending Adversarial Examples by Negative Correlation Ensemble

Jun 11, 2022

Wenjian Luo, Hongwei Zhang, Linghao Kong, Zhijian Chen, Ke Tang

Figure 1 for Defending Adversarial Examples by Negative Correlation Ensemble

Figure 2 for Defending Adversarial Examples by Negative Correlation Ensemble

Figure 3 for Defending Adversarial Examples by Negative Correlation Ensemble

Figure 4 for Defending Adversarial Examples by Negative Correlation Ensemble

Abstract:The security issues in DNNs, such as adversarial examples, have attracted much attention. Adversarial examples refer to the examples which are capable to induce the DNNs return completely predictions by introducing carefully designed perturbations. Obviously, adversarial examples bring great security risks to the development of deep learning. Recently, Some defense approaches against adversarial examples have been proposed, however, in our opinion, the performance of these approaches are still limited. In this paper, we propose a new ensemble defense approach named the Negative Correlation Ensemble (NCEn), which achieves compelling results by introducing gradient directions and gradient magnitudes of each member in the ensemble negatively correlated and at the same time, reducing the transferability of adversarial examples among them. Extensive experiments have been conducted, and the results demonstrate that NCEn can improve the adversarial robustness of ensembles effectively.

Via

Access Paper or Ask Questions