Picture for Rachel Ward

Rachel Ward

Phi-4 Technical Report

Add code
Dec 12, 2024
Viaarxiv icon

Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks

Add code
Oct 12, 2024
Figure 1 for Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks
Figure 2 for Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks
Figure 3 for Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks
Figure 4 for Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks
Viaarxiv icon

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Add code
Apr 23, 2024
Figure 1 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 2 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 3 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 4 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Viaarxiv icon

TinyGSM: achieving >80% on GSM8k with small language models

Add code
Dec 14, 2023
Figure 1 for TinyGSM: achieving >80% on GSM8k with small language models
Figure 2 for TinyGSM: achieving >80% on GSM8k with small language models
Figure 3 for TinyGSM: achieving >80% on GSM8k with small language models
Figure 4 for TinyGSM: achieving >80% on GSM8k with small language models
Viaarxiv icon

Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering

Add code
Jul 20, 2023
Figure 1 for Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering
Figure 2 for Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering
Figure 3 for Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering
Viaarxiv icon

Convergence of Alternating Gradient Descent for Matrix Factorization

Add code
May 11, 2023
Figure 1 for Convergence of Alternating Gradient Descent for Matrix Factorization
Figure 2 for Convergence of Alternating Gradient Descent for Matrix Factorization
Figure 3 for Convergence of Alternating Gradient Descent for Matrix Factorization
Viaarxiv icon

Robust Implicit Regularization via Weight Normalization

Add code
May 09, 2023
Figure 1 for Robust Implicit Regularization via Weight Normalization
Figure 2 for Robust Implicit Regularization via Weight Normalization
Figure 3 for Robust Implicit Regularization via Weight Normalization
Figure 4 for Robust Implicit Regularization via Weight Normalization
Viaarxiv icon

AdaWAC: Adaptively Weighted Augmentation Consistency Regularization for Volumetric Medical Image Segmentation

Add code
Oct 04, 2022
Figure 1 for AdaWAC: Adaptively Weighted Augmentation Consistency Regularization for Volumetric Medical Image Segmentation
Figure 2 for AdaWAC: Adaptively Weighted Augmentation Consistency Regularization for Volumetric Medical Image Segmentation
Figure 3 for AdaWAC: Adaptively Weighted Augmentation Consistency Regularization for Volumetric Medical Image Segmentation
Figure 4 for AdaWAC: Adaptively Weighted Augmentation Consistency Regularization for Volumetric Medical Image Segmentation
Viaarxiv icon

On the fast convergence of minibatch heavy ball momentum

Add code
Jun 15, 2022
Figure 1 for On the fast convergence of minibatch heavy ball momentum
Figure 2 for On the fast convergence of minibatch heavy ball momentum
Figure 3 for On the fast convergence of minibatch heavy ball momentum
Figure 4 for On the fast convergence of minibatch heavy ball momentum
Viaarxiv icon

How catastrophic can catastrophic forgetting be in linear regression?

Add code
May 25, 2022
Figure 1 for How catastrophic can catastrophic forgetting be in linear regression?
Figure 2 for How catastrophic can catastrophic forgetting be in linear regression?
Figure 3 for How catastrophic can catastrophic forgetting be in linear regression?
Figure 4 for How catastrophic can catastrophic forgetting be in linear regression?
Viaarxiv icon