Picture for Mike Lewis

Mike Lewis

Jack

Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training

Add code
May 06, 2024
Figure 1 for Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
Figure 2 for Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
Figure 3 for Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
Figure 4 for Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
Viaarxiv icon

In-Context Pretraining: Language Modeling Beyond Document Boundaries

Add code
Oct 20, 2023
Figure 1 for In-Context Pretraining: Language Modeling Beyond Document Boundaries
Figure 2 for In-Context Pretraining: Language Modeling Beyond Document Boundaries
Figure 3 for In-Context Pretraining: Language Modeling Beyond Document Boundaries
Figure 4 for In-Context Pretraining: Language Modeling Beyond Document Boundaries
Viaarxiv icon

RA-DIT: Retrieval-Augmented Dual Instruction Tuning

Add code
Oct 08, 2023
Figure 1 for RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Figure 2 for RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Figure 3 for RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Figure 4 for RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Viaarxiv icon

Contrastive Decoding Improves Reasoning in Large Language Models

Add code
Sep 29, 2023
Figure 1 for Contrastive Decoding Improves Reasoning in Large Language Models
Figure 2 for Contrastive Decoding Improves Reasoning in Large Language Models
Figure 3 for Contrastive Decoding Improves Reasoning in Large Language Models
Figure 4 for Contrastive Decoding Improves Reasoning in Large Language Models
Viaarxiv icon

Efficient Streaming Language Models with Attention Sinks

Add code
Sep 29, 2023
Figure 1 for Efficient Streaming Language Models with Attention Sinks
Figure 2 for Efficient Streaming Language Models with Attention Sinks
Figure 3 for Efficient Streaming Language Models with Attention Sinks
Figure 4 for Efficient Streaming Language Models with Attention Sinks
Viaarxiv icon

Effective Long-Context Scaling of Foundation Models

Add code
Sep 27, 2023
Figure 1 for Effective Long-Context Scaling of Foundation Models
Figure 2 for Effective Long-Context Scaling of Foundation Models
Figure 3 for Effective Long-Context Scaling of Foundation Models
Figure 4 for Effective Long-Context Scaling of Foundation Models
Viaarxiv icon

Self-Alignment with Instruction Backtranslation

Add code
Aug 14, 2023
Figure 1 for Self-Alignment with Instruction Backtranslation
Figure 2 for Self-Alignment with Instruction Backtranslation
Figure 3 for Self-Alignment with Instruction Backtranslation
Figure 4 for Self-Alignment with Instruction Backtranslation
Viaarxiv icon

Trusting Your Evidence: Hallucinate Less with Context-aware Decoding

Add code
May 24, 2023
Figure 1 for Trusting Your Evidence: Hallucinate Less with Context-aware Decoding
Figure 2 for Trusting Your Evidence: Hallucinate Less with Context-aware Decoding
Figure 3 for Trusting Your Evidence: Hallucinate Less with Context-aware Decoding
Figure 4 for Trusting Your Evidence: Hallucinate Less with Context-aware Decoding
Viaarxiv icon

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Add code
May 23, 2023
Figure 1 for FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Figure 2 for FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Figure 3 for FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Figure 4 for FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Viaarxiv icon

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Add code
May 19, 2023
Figure 1 for MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Figure 2 for MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Figure 3 for MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Figure 4 for MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Viaarxiv icon