Picture for Ilya Loshchilov

Ilya Loshchilov

LIS

NVIDIA Nemotron 3: Efficient and Open Intelligence

Add code
Dec 24, 2025
Viaarxiv icon

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Add code
Dec 23, 2025
Viaarxiv icon

nGPT: Normalized Transformer with Representation Learning on the Hypersphere

Add code
Oct 01, 2024
Figure 1 for nGPT: Normalized Transformer with Representation Learning on the Hypersphere
Figure 2 for nGPT: Normalized Transformer with Representation Learning on the Hypersphere
Figure 3 for nGPT: Normalized Transformer with Representation Learning on the Hypersphere
Figure 4 for nGPT: Normalized Transformer with Representation Learning on the Hypersphere
Viaarxiv icon

Weight Norm Control

Add code
Nov 21, 2023
Figure 1 for Weight Norm Control
Figure 2 for Weight Norm Control
Figure 3 for Weight Norm Control
Figure 4 for Weight Norm Control
Viaarxiv icon

Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari

Add code
Feb 24, 2018
Figure 1 for Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari
Figure 2 for Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari
Figure 3 for Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari
Figure 4 for Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari
Viaarxiv icon

Fixing Weight Decay Regularization in Adam

Add code
Feb 14, 2018
Figure 1 for Fixing Weight Decay Regularization in Adam
Figure 2 for Fixing Weight Decay Regularization in Adam
Figure 3 for Fixing Weight Decay Regularization in Adam
Figure 4 for Fixing Weight Decay Regularization in Adam
Viaarxiv icon

A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets

Add code
Aug 23, 2017
Figure 1 for A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets
Figure 2 for A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets
Figure 3 for A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets
Figure 4 for A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets
Viaarxiv icon

Limited-Memory Matrix Adaptation for Large Scale Black-box Optimization

Add code
May 18, 2017
Figure 1 for Limited-Memory Matrix Adaptation for Large Scale Black-box Optimization
Figure 2 for Limited-Memory Matrix Adaptation for Large Scale Black-box Optimization
Figure 3 for Limited-Memory Matrix Adaptation for Large Scale Black-box Optimization
Figure 4 for Limited-Memory Matrix Adaptation for Large Scale Black-box Optimization
Viaarxiv icon

SGDR: Stochastic Gradient Descent with Warm Restarts

Add code
May 03, 2017
Figure 1 for SGDR: Stochastic Gradient Descent with Warm Restarts
Figure 2 for SGDR: Stochastic Gradient Descent with Warm Restarts
Figure 3 for SGDR: Stochastic Gradient Descent with Warm Restarts
Figure 4 for SGDR: Stochastic Gradient Descent with Warm Restarts
Viaarxiv icon

Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES)

Add code
May 09, 2016
Figure 1 for Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES)
Figure 2 for Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES)
Figure 3 for Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES)
Figure 4 for Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES)
Viaarxiv icon