Abstract:We present a comprehensive comparison of convolutional and transformer-based models for distinguishing quark and gluon jets using simulated jet images from Pythia 8. By encoding jet substructure into a three-channel representation of particle kinematics, we evaluate the performance of convolutional neural networks (CNNs), Vision Transformers (ViTs), and Swin Transformers (Swin-Tiny) under both supervised and self-supervised learning setups. Our results show that fine-tuning only the final two transformer blocks of the Swin-Tiny model achieves the best trade-off between efficiency and accuracy, reaching 81.4% accuracy and an AUC (area under the ROC curve) of 88.9%. Self-supervised pretraining with Momentum Contrast (MoCo) further enhances feature robustness and reduces the number of trainable parameters. These findings highlight the potential of hierarchical attention-based models for jet substructure studies and for domain transfer to real collision data.
Abstract:We propose Anti-regularization (AR), which adds a sign-reversed reward term to the loss to intentionally increase model expressivity in the small-sample regime, and then attenuates this intervention with a power-law decay as the sample size grows. We formalize spectral safety and trust-region conditions, and design a lightweight stability safeguard that combines a projection operator with gradient clipping, ensuring stable intervention under stated assumptions. Our analysis spans linear smoothers and the Neural Tangent Kernel (NTK) regime, providing practical guidance on selecting the decay exponent by balancing empirical risk against variance. Empirically, AR reduces underfitting while preserving generalization and improving calibration in both regression and classification. Ablation studies confirm that the decay schedule and the stability safeguard are critical to preventing overfitting and numerical instability. We further examine a degrees-of-freedom targeting schedule that keeps per-sample complexity approximately constant. AR is simple to implement and reproducible, integrating cleanly into standard empirical risk minimization pipelines. It enables robust learning in data- and resource-constrained settings by intervening only when beneficial and fading away when unnecessary.