Picture for Margaret Li

Margaret Li

Precise Information Control in Long-Form Text Generation

Add code
Jun 06, 2025
Viaarxiv icon

(Mis)Fitting: A Survey of Scaling Laws

Add code
Feb 26, 2025
Figure 1 for (Mis)Fitting: A Survey of Scaling Laws
Figure 2 for (Mis)Fitting: A Survey of Scaling Laws
Figure 3 for (Mis)Fitting: A Survey of Scaling Laws
Figure 4 for (Mis)Fitting: A Survey of Scaling Laws
Viaarxiv icon

Byte Latent Transformer: Patches Scale Better Than Tokens

Add code
Dec 13, 2024
Viaarxiv icon

Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling

Add code
Jul 02, 2024
Viaarxiv icon

Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models

Add code
Jan 19, 2024
Figure 1 for Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models
Figure 2 for Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models
Figure 3 for Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models
Figure 4 for Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models
Viaarxiv icon

In-Context Pretraining: Language Modeling Beyond Document Boundaries

Add code
Oct 20, 2023
Viaarxiv icon

Scaling Expert Language Models with Unsupervised Domain Discovery

Add code
Mar 24, 2023
Figure 1 for Scaling Expert Language Models with Unsupervised Domain Discovery
Figure 2 for Scaling Expert Language Models with Unsupervised Domain Discovery
Figure 3 for Scaling Expert Language Models with Unsupervised Domain Discovery
Figure 4 for Scaling Expert Language Models with Unsupervised Domain Discovery
Viaarxiv icon

Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

Add code
Aug 05, 2022
Figure 1 for Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models
Figure 2 for Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models
Figure 3 for Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models
Figure 4 for Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models
Viaarxiv icon

Don't Sweep your Learning Rate under the Rug: A Closer Look at Cross-modal Transfer of Pretrained Transformers

Add code
Jul 26, 2021
Figure 1 for Don't Sweep your Learning Rate under the Rug: A Closer Look at Cross-modal Transfer of Pretrained Transformers
Figure 2 for Don't Sweep your Learning Rate under the Rug: A Closer Look at Cross-modal Transfer of Pretrained Transformers
Figure 3 for Don't Sweep your Learning Rate under the Rug: A Closer Look at Cross-modal Transfer of Pretrained Transformers
Figure 4 for Don't Sweep your Learning Rate under the Rug: A Closer Look at Cross-modal Transfer of Pretrained Transformers
Viaarxiv icon

Recipes for Safety in Open-domain Chatbots

Add code
Oct 22, 2020
Figure 1 for Recipes for Safety in Open-domain Chatbots
Figure 2 for Recipes for Safety in Open-domain Chatbots
Figure 3 for Recipes for Safety in Open-domain Chatbots
Figure 4 for Recipes for Safety in Open-domain Chatbots
Viaarxiv icon