Picture for Eugene Belilovsky

Eugene Belilovsky

MILA

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Add code
Jul 07, 2024
Viaarxiv icon

Controlling Forgetting with Test-Time Data in Continual Learning

Add code
Jun 19, 2024
Viaarxiv icon

PETRA: Parallel End-to-end Training with Reversible Architectures

Add code
Jun 04, 2024
Figure 1 for PETRA: Parallel End-to-end Training with Reversible Architectures
Figure 2 for PETRA: Parallel End-to-end Training with Reversible Architectures
Figure 3 for PETRA: Parallel End-to-end Training with Reversible Architectures
Figure 4 for PETRA: Parallel End-to-end Training with Reversible Architectures
Viaarxiv icon

From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation

Add code
Jun 03, 2024
Figure 1 for From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation
Figure 2 for From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation
Figure 3 for From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation
Figure 4 for From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation
Viaarxiv icon

ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training

Add code
Jun 03, 2024
Figure 1 for ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training
Figure 2 for ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training
Figure 3 for ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training
Figure 4 for ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training
Viaarxiv icon

Temporally Consistent Object Editing in Videos using Extended Attention

Add code
Jun 01, 2024
Viaarxiv icon

$μ$LO: Compute-Efficient Meta-Generalization of Learned Optimizers

Add code
May 31, 2024
Figure 1 for $μ$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
Figure 2 for $μ$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
Figure 3 for $μ$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
Figure 4 for $μ$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
Viaarxiv icon

WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average

Add code
May 27, 2024
Figure 1 for WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average
Figure 2 for WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average
Figure 3 for WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average
Figure 4 for WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average
Viaarxiv icon

AdaFisher: Adaptive Second Order Optimization via Fisher Information

Add code
May 26, 2024
Figure 1 for AdaFisher: Adaptive Second Order Optimization via Fisher Information
Figure 2 for AdaFisher: Adaptive Second Order Optimization via Fisher Information
Figure 3 for AdaFisher: Adaptive Second Order Optimization via Fisher Information
Figure 4 for AdaFisher: Adaptive Second Order Optimization via Fisher Information
Viaarxiv icon

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Add code
Mar 26, 2024
Figure 1 for Simple and Scalable Strategies to Continually Pre-train Large Language Models
Figure 2 for Simple and Scalable Strategies to Continually Pre-train Large Language Models
Figure 3 for Simple and Scalable Strategies to Continually Pre-train Large Language Models
Figure 4 for Simple and Scalable Strategies to Continually Pre-train Large Language Models
Viaarxiv icon