Picture for Eugene Belilovsky

Eugene Belilovsky

MILA

MuLoCo: Muon is a practical inner optimizer for DiLoCo

Add code
May 29, 2025
Viaarxiv icon

Incentivizing Permissionless Distributed Learning of LLMs

Add code
May 27, 2025
Viaarxiv icon

Continual Pre-training of MoEs: How robust is your router?

Add code
Mar 06, 2025
Viaarxiv icon

Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training

Add code
Mar 06, 2025
Viaarxiv icon

FairDropout: Using Example-Tied Dropout to Enhance Generalization of Minority Groups

Add code
Feb 10, 2025
Figure 1 for FairDropout: Using Example-Tied Dropout to Enhance Generalization of Minority Groups
Figure 2 for FairDropout: Using Example-Tied Dropout to Enhance Generalization of Minority Groups
Figure 3 for FairDropout: Using Example-Tied Dropout to Enhance Generalization of Minority Groups
Figure 4 for FairDropout: Using Example-Tied Dropout to Enhance Generalization of Minority Groups
Viaarxiv icon

Non-Uniform Parameter-Wise Model Merging

Add code
Dec 20, 2024
Viaarxiv icon

Sketch-guided Cage-based 3D Gaussian Splatting Deformation

Add code
Nov 19, 2024
Viaarxiv icon

Towards motion from video diffusion models

Add code
Nov 19, 2024
Viaarxiv icon

Not Only the Last-Layer Features for Spurious Correlations: All Layer Deep Feature Reweighting

Add code
Sep 23, 2024
Figure 1 for Not Only the Last-Layer Features for Spurious Correlations: All Layer Deep Feature Reweighting
Figure 2 for Not Only the Last-Layer Features for Spurious Correlations: All Layer Deep Feature Reweighting
Figure 3 for Not Only the Last-Layer Features for Spurious Correlations: All Layer Deep Feature Reweighting
Figure 4 for Not Only the Last-Layer Features for Spurious Correlations: All Layer Deep Feature Reweighting
Viaarxiv icon

Accelerating Training with Neuron Interaction and Nowcasting Networks

Add code
Sep 06, 2024
Figure 1 for Accelerating Training with Neuron Interaction and Nowcasting Networks
Figure 2 for Accelerating Training with Neuron Interaction and Nowcasting Networks
Figure 3 for Accelerating Training with Neuron Interaction and Nowcasting Networks
Figure 4 for Accelerating Training with Neuron Interaction and Nowcasting Networks
Viaarxiv icon