Picture for Lechao Xiao

Lechao Xiao

Scaling Exponents Across Parameterizations and Optimizers

Add code
Jul 08, 2024
Viaarxiv icon

4+3 Phases of Compute-Optimal Neural Scaling Laws

Add code
May 23, 2024
Viaarxiv icon

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Add code
Dec 22, 2023
Figure 1 for Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Figure 2 for Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Figure 3 for Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Figure 4 for Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Viaarxiv icon

Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?

Add code
Nov 15, 2023
Figure 1 for Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?
Figure 2 for Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?
Figure 3 for Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?
Figure 4 for Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?
Viaarxiv icon

Small-scale proxies for large-scale Transformer training instabilities

Add code
Sep 25, 2023
Figure 1 for Small-scale proxies for large-scale Transformer training instabilities
Figure 2 for Small-scale proxies for large-scale Transformer training instabilities
Figure 3 for Small-scale proxies for large-scale Transformer training instabilities
Figure 4 for Small-scale proxies for large-scale Transformer training instabilities
Viaarxiv icon

Fast Neural Kernel Embeddings for General Activations

Add code
Sep 09, 2022
Figure 1 for Fast Neural Kernel Embeddings for General Activations
Figure 2 for Fast Neural Kernel Embeddings for General Activations
Figure 3 for Fast Neural Kernel Embeddings for General Activations
Figure 4 for Fast Neural Kernel Embeddings for General Activations
Viaarxiv icon

Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm

Add code
Jul 11, 2022
Figure 1 for Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm
Figure 2 for Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm
Figure 3 for Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm
Figure 4 for Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm
Viaarxiv icon

Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression

Add code
May 30, 2022
Figure 1 for Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression
Figure 2 for Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression
Figure 3 for Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression
Figure 4 for Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression
Viaarxiv icon

Eigenspace Restructuring: a Principle of Space and Frequency in Neural Networks

Add code
Dec 10, 2021
Figure 1 for Eigenspace Restructuring: a Principle of Space and Frequency in Neural Networks
Figure 2 for Eigenspace Restructuring: a Principle of Space and Frequency in Neural Networks
Figure 3 for Eigenspace Restructuring: a Principle of Space and Frequency in Neural Networks
Figure 4 for Eigenspace Restructuring: a Principle of Space and Frequency in Neural Networks
Viaarxiv icon

Dataset Distillation with Infinitely Wide Convolutional Networks

Add code
Jul 27, 2021
Figure 1 for Dataset Distillation with Infinitely Wide Convolutional Networks
Figure 2 for Dataset Distillation with Infinitely Wide Convolutional Networks
Figure 3 for Dataset Distillation with Infinitely Wide Convolutional Networks
Figure 4 for Dataset Distillation with Infinitely Wide Convolutional Networks
Viaarxiv icon