Abstract:We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts architecture that optimizes for both accuracy per FLOP and accuracy per parameter, and 3) include MTP layers for inference acceleration through native speculative decoding. We pre-trained Nemotron 3 Super on 25 trillion tokens followed by post-training using supervised fine tuning (SFT) and reinforcement learning (RL). The final model supports up to 1M context length and achieves comparable accuracy on common benchmarks, while also achieving up to 2.2x and 7.5x higher inference throughput compared to GPT-OSS-120B and Qwen3.5-122B, respectively. Nemotron 3 Super datasets, along with the base, post-trained, and quantized checkpoints, are open-sourced on HuggingFace.



Abstract:In this study, we explore an ensemble-based approach to improve classification accuracy in complex image datasets. Utilizing a Convolutional Block Attention Module (CBAM) alongside a Deep Neural Network (DNN) we leverage the unique feature-extraction capabilities of each model to enhance the overall accuracy. Additional models, such as Random Forest, XGBoost, Support Vector Machine (SVM), and K-Nearest Neighbors (KNN), are introduced to further diversify the predictive power of our ensemble. By leveraging these methods, the proposed approach provides robust feature discrimination and improved classification results. Experimental evaluations demonstrate that the ensemble achieves higher accuracy and robustness across challenging and imbalanced classes, showing significant promise for broader applications in computer vision tasks.