Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vincent-Daniel Yun

Weight Variance Amplifier Improves Accuracy in High-Sparsity One-Shot Pruning

Nov 18, 2025

Vincent-Daniel Yun, Junhyuk Jo, Sunwoo Lee

Figure 1 for Weight Variance Amplifier Improves Accuracy in High-Sparsity One-Shot Pruning

Figure 2 for Weight Variance Amplifier Improves Accuracy in High-Sparsity One-Shot Pruning

Figure 3 for Weight Variance Amplifier Improves Accuracy in High-Sparsity One-Shot Pruning

Figure 4 for Weight Variance Amplifier Improves Accuracy in High-Sparsity One-Shot Pruning

Abstract:Deep neural networks achieve outstanding performance in visual recognition tasks, yet their large number of parameters makes them less practical for real-world applications. Recently, one-shot pruning has emerged as an effective strategy for reducing model size without additional training. However, models trained with standard objective functions often suffer a significant drop in accuracy after aggressive pruning. Some existing pruning-robust optimizers, such as SAM, and CrAM, mitigate this accuracy drop by guiding the model toward flatter regions of the parameter space, but they inevitably incur non-negligible additional computations. We propose a Variance Amplifying Regularizer (VAR) that deliberately increases the variance of model parameters during training. Our study reveals an intriguing finding that parameters with higher variance exhibit greater pruning robustness. VAR exploits this property by promoting such variance in the weight distribution, thereby mitigating the adverse effects of pruning. We further provide a theoretical analysis of its convergence behavior, supported by extensive empirical results demonstrating the superior pruning robustness of VAR.

Via

Access Paper or Ask Questions

MedCLM: Learning to Localize and Reason via a CoT-Curriculum in Medical Vision-Language Models

Oct 06, 2025

Soo Yong Kim, Suin Cho, Vincent-Daniel Yun, Gyeongyeon Hwang

Figure 1 for MedCLM: Learning to Localize and Reason via a CoT-Curriculum in Medical Vision-Language Models

Figure 2 for MedCLM: Learning to Localize and Reason via a CoT-Curriculum in Medical Vision-Language Models

Figure 3 for MedCLM: Learning to Localize and Reason via a CoT-Curriculum in Medical Vision-Language Models

Figure 4 for MedCLM: Learning to Localize and Reason via a CoT-Curriculum in Medical Vision-Language Models

Abstract:Bridging clinical diagnostic reasoning with AI remains a central challenge in medical imaging. We introduce MedCLM, an automated pipeline that converts detection datasets into large-scale medical visual question answering (VQA) data with Chain-of-Thought (CoT) reasoning by linking lesion boxes to organ segmentation and structured rationales. These contextual signals enable medical vision-language models to generate question-answer pairs with step-by-step reasoning. To utilize this data effectively, we propose an Integrated CoT-Curriculum Strategy composed of an Easy stage with explicit lesion boxes for visual grounding, a Medium stage that encourages implicit localization, and a Hard stage for weakly supervised reasoning. Experimental results demonstrate that MedCLM attains state-of-the-art performance on several medical VQA benchmarks, providing a scalable framework for developing clinically aligned medical vision-language models.

Via

Access Paper or Ask Questions

Insights from Gradient Dynamics: Gradient Autoscaled Normalization

Sep 03, 2025

Vincent-Daniel Yun

Abstract:Gradient dynamics play a central role in determining the stability and generalization of deep neural networks. In this work, we provide an empirical analysis of how variance and standard deviation of gradients evolve during training, showing consistent changes across layers and at the global scale in convolutional networks. Motivated by these observations, we propose a hyperparameter-free gradient normalization method that aligns gradient scaling with their natural evolution. This approach prevents unintended amplification, stabilizes optimization, and preserves convergence guarantees. Experiments on the challenging CIFAR-100 benchmark with ResNet-20, ResNet-56, and VGG-16-BN demonstrate that our method maintains or improves test accuracy even under strong generalization. Beyond practical performance, our study highlights the importance of directly tracking gradient dynamics, aiming to bridge the gap between theoretical expectations and empirical behaviors, and to provide insights for future optimization research.

Via

Access Paper or Ask Questions

SGD Convergence under Stepsize Shrinkage in Low-Precision Training

Aug 10, 2025

Vincent-Daniel Yun

Abstract:Low-precision training has become essential for reducing the computational and memory costs of large-scale deep learning. However, quantization of gradients introduces both magnitude shrinkage and additive noise, which can alter the convergence behavior of stochastic gradient descent (SGD). In this work, we study the convergence of SGD under a gradient shrinkage model, where each stochastic gradient is scaled by a factor $q_k \in (0,1]$ and perturbed by zero-mean quantization noise. We show that this shrinkage is equivalent to replacing the nominal stepsize $\mu_k$ with an effective stepsize $\mu_k q_k$, which slows convergence when $q_{\min} < 1$. Under standard smoothness and bounded-variance assumptions, we prove that low-precision SGD still converges, but at a reduced rate determined by $q_{\min}$, and with an increased asymptotic error floor due to quantization noise. We theoretically analyze how reduced numerical precision slows down training by modeling it as gradient shrinkage in the standard SGD convergence framework.

Via

Access Paper or Ask Questions