Picture for Martin Jaggi

Martin Jaggi

EPFL

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Add code
May 29, 2024
Figure 1 for Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Figure 2 for Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Figure 3 for Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Figure 4 for Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Viaarxiv icon

Deep Grokking: Would Deep Neural Networks Generalize Better?

Add code
May 29, 2024
Figure 1 for Deep Grokking: Would Deep Neural Networks Generalize Better?
Figure 2 for Deep Grokking: Would Deep Neural Networks Generalize Better?
Figure 3 for Deep Grokking: Would Deep Neural Networks Generalize Better?
Figure 4 for Deep Grokking: Would Deep Neural Networks Generalize Better?
Viaarxiv icon

The Privacy Power of Correlated Noise in Decentralized Learning

Add code
May 02, 2024
Figure 1 for The Privacy Power of Correlated Noise in Decentralized Learning
Figure 2 for The Privacy Power of Correlated Noise in Decentralized Learning
Figure 3 for The Privacy Power of Correlated Noise in Decentralized Learning
Viaarxiv icon

Personalized Collaborative Fine-Tuning for On-Device Large Language Models

Add code
Apr 15, 2024
Figure 1 for Personalized Collaborative Fine-Tuning for On-Device Large Language Models
Figure 2 for Personalized Collaborative Fine-Tuning for On-Device Large Language Models
Figure 3 for Personalized Collaborative Fine-Tuning for On-Device Large Language Models
Figure 4 for Personalized Collaborative Fine-Tuning for On-Device Large Language Models
Viaarxiv icon

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

Add code
Mar 30, 2024
Figure 1 for QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
Figure 2 for QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
Figure 3 for QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
Figure 4 for QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
Viaarxiv icon

Towards an empirical understanding of MoE design choices

Add code
Feb 20, 2024
Viaarxiv icon

Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains

Add code
Feb 06, 2024
Figure 1 for Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains
Figure 2 for Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains
Figure 3 for Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains
Figure 4 for Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains
Viaarxiv icon

InterpretCC: Conditional Computation for Inherently Interpretable Neural Networks

Add code
Feb 05, 2024
Figure 1 for InterpretCC: Conditional Computation for Inherently Interpretable Neural Networks
Figure 2 for InterpretCC: Conditional Computation for Inherently Interpretable Neural Networks
Figure 3 for InterpretCC: Conditional Computation for Inherently Interpretable Neural Networks
Figure 4 for InterpretCC: Conditional Computation for Inherently Interpretable Neural Networks
Viaarxiv icon

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

Add code
Feb 04, 2024
Figure 1 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Figure 2 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Figure 3 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Figure 4 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Viaarxiv icon

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Add code
Nov 27, 2023
Figure 1 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Figure 2 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Figure 3 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Figure 4 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Viaarxiv icon