Picture for Xiaoge Deng

Xiaoge Deng

Breaking Memory Limits: Gradient Wavelet Transform Enhances LLMs Training

Add code
Jan 13, 2025
Figure 1 for Breaking Memory Limits: Gradient Wavelet Transform Enhances LLMs Training
Figure 2 for Breaking Memory Limits: Gradient Wavelet Transform Enhances LLMs Training
Figure 3 for Breaking Memory Limits: Gradient Wavelet Transform Enhances LLMs Training
Figure 4 for Breaking Memory Limits: Gradient Wavelet Transform Enhances LLMs Training
Viaarxiv icon

Sharpness-Aware Minimization with Adaptive Regularization for Training Deep Neural Networks

Add code
Dec 22, 2024
Figure 1 for Sharpness-Aware Minimization with Adaptive Regularization for Training Deep Neural Networks
Figure 2 for Sharpness-Aware Minimization with Adaptive Regularization for Training Deep Neural Networks
Figure 3 for Sharpness-Aware Minimization with Adaptive Regularization for Training Deep Neural Networks
Figure 4 for Sharpness-Aware Minimization with Adaptive Regularization for Training Deep Neural Networks
Viaarxiv icon

Federated Prediction-Powered Inference from Decentralized Data

Add code
Sep 03, 2024
Figure 1 for Federated Prediction-Powered Inference from Decentralized Data
Figure 2 for Federated Prediction-Powered Inference from Decentralized Data
Figure 3 for Federated Prediction-Powered Inference from Decentralized Data
Figure 4 for Federated Prediction-Powered Inference from Decentralized Data
Viaarxiv icon

Score-based Generative Models with Adaptive Momentum

Add code
May 22, 2024
Figure 1 for Score-based Generative Models with Adaptive Momentum
Figure 2 for Score-based Generative Models with Adaptive Momentum
Figure 3 for Score-based Generative Models with Adaptive Momentum
Figure 4 for Score-based Generative Models with Adaptive Momentum
Viaarxiv icon

Accelerating Federated Learning by Selecting Beneficial Herd of Local Gradients

Add code
Mar 25, 2024
Figure 1 for Accelerating Federated Learning by Selecting Beneficial Herd of Local Gradients
Figure 2 for Accelerating Federated Learning by Selecting Beneficial Herd of Local Gradients
Figure 3 for Accelerating Federated Learning by Selecting Beneficial Herd of Local Gradients
Figure 4 for Accelerating Federated Learning by Selecting Beneficial Herd of Local Gradients
Viaarxiv icon

Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent

Add code
Aug 18, 2023
Figure 1 for Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent
Figure 2 for Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent
Figure 3 for Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent
Figure 4 for Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent
Viaarxiv icon

S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning

Add code
Oct 05, 2021
Figure 1 for S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning
Figure 2 for S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning
Figure 3 for S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning
Figure 4 for S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning
Viaarxiv icon