Picture for Yuandong Tian

Yuandong Tian

JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention

Add code
Oct 03, 2023
Figure 1 for JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Figure 2 for JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Figure 3 for JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Figure 4 for JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Viaarxiv icon

GenCO: Generating Diverse Solutions to Design Problems with Combinatorial Nature

Add code
Oct 03, 2023
Figure 1 for GenCO: Generating Diverse Solutions to Design Problems with Combinatorial Nature
Figure 2 for GenCO: Generating Diverse Solutions to Design Problems with Combinatorial Nature
Figure 3 for GenCO: Generating Diverse Solutions to Design Problems with Combinatorial Nature
Figure 4 for GenCO: Generating Diverse Solutions to Design Problems with Combinatorial Nature
Viaarxiv icon

Efficient Streaming Language Models with Attention Sinks

Add code
Sep 29, 2023
Figure 1 for Efficient Streaming Language Models with Attention Sinks
Figure 2 for Efficient Streaming Language Models with Attention Sinks
Figure 3 for Efficient Streaming Language Models with Attention Sinks
Figure 4 for Efficient Streaming Language Models with Attention Sinks
Viaarxiv icon

RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment

Add code
Jul 24, 2023
Figure 1 for RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment
Figure 2 for RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment
Figure 3 for RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment
Figure 4 for RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment
Viaarxiv icon

H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

Add code
Jul 19, 2023
Viaarxiv icon

Landscape Surrogate: Learning Decision Losses for Mathematical Optimization Under Partial Information

Add code
Jul 18, 2023
Figure 1 for Landscape Surrogate: Learning Decision Losses for Mathematical Optimization Under Partial Information
Figure 2 for Landscape Surrogate: Learning Decision Losses for Mathematical Optimization Under Partial Information
Figure 3 for Landscape Surrogate: Learning Decision Losses for Mathematical Optimization Under Partial Information
Figure 4 for Landscape Surrogate: Learning Decision Losses for Mathematical Optimization Under Partial Information
Viaarxiv icon

Extending Context Window of Large Language Models via Positional Interpolation

Add code
Jun 28, 2023
Figure 1 for Extending Context Window of Large Language Models via Positional Interpolation
Figure 2 for Extending Context Window of Large Language Models via Positional Interpolation
Figure 3 for Extending Context Window of Large Language Models via Positional Interpolation
Figure 4 for Extending Context Window of Large Language Models via Positional Interpolation
Viaarxiv icon

Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer

Add code
May 25, 2023
Figure 1 for Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Figure 2 for Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Figure 3 for Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Figure 4 for Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Viaarxiv icon

Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models

Add code
May 03, 2023
Figure 1 for Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models
Figure 2 for Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models
Figure 3 for Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models
Figure 4 for Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models
Viaarxiv icon

A Cookbook of Self-Supervised Learning

Add code
Apr 24, 2023
Figure 1 for A Cookbook of Self-Supervised Learning
Figure 2 for A Cookbook of Self-Supervised Learning
Figure 3 for A Cookbook of Self-Supervised Learning
Figure 4 for A Cookbook of Self-Supervised Learning
Viaarxiv icon