Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mihai Suteu

Simplifying Multi-Task Architectures Through Task-Specific Normalization

Dec 23, 2025

Mihai Suteu, Ovidiu Serban

Abstract:Multi-task learning (MTL) aims to leverage shared knowledge across tasks to improve generalization and parameter efficiency, yet balancing resources and mitigating interference remain open challenges. Architectural solutions often introduce elaborate task-specific modules or routing schemes, increasing complexity and overhead. In this work, we show that normalization layers alone are sufficient to address many of these challenges. Simply replacing shared normalization with task-specific variants already yields competitive performance, questioning the need for complex designs. Building on this insight, we propose Task-Specific Sigmoid Batch Normalization (TS$σ$BN), a lightweight mechanism that enables tasks to softly allocate network capacity while fully sharing feature extractors. TS$σ$BN improves stability across CNNs and Transformers, matching or exceeding performance on NYUv2, Cityscapes, CelebA, and PascalContext, while remaining highly parameter-efficient. Moreover, its learned gates provide a natural framework for analyzing MTL dynamics, offering interpretable insights into capacity allocation, filter specialization, and task relationships. Our findings suggest that complex MTL architectures may be unnecessary and that task-specific normalization offers a simple, interpretable, and efficient alternative.

Via

Access Paper or Ask Questions

Receding Neuron Importances for Structured Pruning

Apr 13, 2022

Mihai Suteu, Yike Guo

Figure 1 for Receding Neuron Importances for Structured Pruning

Figure 2 for Receding Neuron Importances for Structured Pruning

Figure 3 for Receding Neuron Importances for Structured Pruning

Figure 4 for Receding Neuron Importances for Structured Pruning

Abstract:Structured pruning efficiently compresses networks by identifying and removing unimportant neurons. While this can be elegantly achieved by applying sparsity-inducing regularisation on BatchNorm parameters, an L1 penalty would shrink all scaling factors rather than just those of superfluous neurons. To tackle this issue, we introduce a simple BatchNorm variation with bounded scaling parameters, based on which we design a novel regularisation term that suppresses only neurons with low importance. Under our method, the weights of unnecessary neurons effectively recede, producing a polarised bimodal distribution of importances. We show that neural networks trained this way can be pruned to a larger extent and with less deterioration. We one-shot prune VGG and ResNet architectures at different ratios on CIFAR and ImagenNet datasets. In the case of VGG-style networks, our method significantly outperforms existing approaches particularly under a severe pruning regime.

Via

Access Paper or Ask Questions

Regularizing Deep Multi-Task Networks using Orthogonal Gradients

Dec 14, 2019

Mihai Suteu, Yike Guo

Figure 1 for Regularizing Deep Multi-Task Networks using Orthogonal Gradients

Figure 2 for Regularizing Deep Multi-Task Networks using Orthogonal Gradients

Figure 3 for Regularizing Deep Multi-Task Networks using Orthogonal Gradients

Figure 4 for Regularizing Deep Multi-Task Networks using Orthogonal Gradients

Abstract:Deep neural networks are a promising approach towards multi-task learning because of their capability to leverage knowledge across domains and learn general purpose representations. Nevertheless, they can fail to live up to these promises as tasks often compete for a model's limited resources, potentially leading to lower overall performance. In this work we tackle the issue of interfering tasks through a comprehensive analysis of their training, derived from looking at the interaction between gradients within their shared parameters. Our empirical results show that well-performing models have low variance in the angles between task gradients and that popular regularization methods implicitly reduce this measure. Based on this observation, we propose a novel gradient regularization term that minimizes task interference by enforcing near orthogonal gradients. Updating the shared parameters using this property encourages task specific decoders to optimize different parts of the feature extractor, thus reducing competition. We evaluate our method with classification and regression tasks on the multiDigitMNIST, NYUv2 and SUN RGB-D datasets where we obtain competitive results.

* 11 pages, 5 figures

Via

Access Paper or Ask Questions