Picture for Parameswaran Raman

Parameswaran Raman

HLAT: High-quality Large Language Model Pre-trained on AWS Trainium

Add code
Apr 16, 2024
Figure 1 for HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Figure 2 for HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Figure 3 for HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Figure 4 for HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Viaarxiv icon

EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence

Add code
Apr 16, 2024
Figure 1 for EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
Figure 2 for EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
Figure 3 for EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
Figure 4 for EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
Viaarxiv icon

Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models

Add code
Apr 11, 2024
Figure 1 for Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
Figure 2 for Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
Figure 3 for Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
Figure 4 for Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
Viaarxiv icon

MADA: Meta-Adaptive Optimizers through hyper-gradient Descent

Add code
Jan 17, 2024
Figure 1 for MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Figure 2 for MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Figure 3 for MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Figure 4 for MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Viaarxiv icon

Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate

Add code
Jan 05, 2024
Figure 1 for Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate
Figure 2 for Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate
Figure 3 for Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate
Viaarxiv icon

Contractive error feedback for gradient compression

Add code
Dec 13, 2023
Figure 1 for Contractive error feedback for gradient compression
Figure 2 for Contractive error feedback for gradient compression
Figure 3 for Contractive error feedback for gradient compression
Figure 4 for Contractive error feedback for gradient compression
Viaarxiv icon

DS-FACTO: Doubly Separable Factorization Machines

Add code
Apr 29, 2020
Figure 1 for DS-FACTO: Doubly Separable Factorization Machines
Figure 2 for DS-FACTO: Doubly Separable Factorization Machines
Figure 3 for DS-FACTO: Doubly Separable Factorization Machines
Figure 4 for DS-FACTO: Doubly Separable Factorization Machines
Viaarxiv icon

Optimization on the Surface of the -Sphere

Add code
Sep 13, 2019
Figure 1 for Optimization on the Surface of the -Sphere
Figure 2 for Optimization on the Surface of the -Sphere
Figure 3 for Optimization on the Surface of the -Sphere
Figure 4 for Optimization on the Surface of the -Sphere
Viaarxiv icon

DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression

Add code
Aug 03, 2018
Figure 1 for DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression
Figure 2 for DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression
Figure 3 for DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression
Figure 4 for DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression
Viaarxiv icon

Extreme Stochastic Variational Inference: Distributed and Asynchronous

Add code
Aug 03, 2018
Figure 1 for Extreme Stochastic Variational Inference: Distributed and Asynchronous
Figure 2 for Extreme Stochastic Variational Inference: Distributed and Asynchronous
Figure 3 for Extreme Stochastic Variational Inference: Distributed and Asynchronous
Figure 4 for Extreme Stochastic Variational Inference: Distributed and Asynchronous
Viaarxiv icon