Picture for Sebastian Jaszczur

Sebastian Jaszczur

Scaling Laws for Fine-Grained Mixture of Experts

Add code
Feb 12, 2024
Viaarxiv icon

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Add code
Jan 08, 2024
Figure 1 for MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Figure 2 for MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Figure 3 for MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Figure 4 for MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Viaarxiv icon

Structured Packing in LLM Training Improves Long Context Utilization

Add code
Jan 02, 2024
Figure 1 for Structured Packing in LLM Training Improves Long Context Utilization
Figure 2 for Structured Packing in LLM Training Improves Long Context Utilization
Figure 3 for Structured Packing in LLM Training Improves Long Context Utilization
Figure 4 for Structured Packing in LLM Training Improves Long Context Utilization
Viaarxiv icon

Mixture of Tokens: Efficient LLMs through Cross-Example Aggregation

Add code
Oct 24, 2023
Viaarxiv icon

Sparse is Enough in Scaling Transformers

Add code
Nov 24, 2021
Figure 1 for Sparse is Enough in Scaling Transformers
Figure 2 for Sparse is Enough in Scaling Transformers
Figure 3 for Sparse is Enough in Scaling Transformers
Figure 4 for Sparse is Enough in Scaling Transformers
Viaarxiv icon

Neural heuristics for SAT solving

Add code
May 27, 2020
Figure 1 for Neural heuristics for SAT solving
Figure 2 for Neural heuristics for SAT solving
Figure 3 for Neural heuristics for SAT solving
Figure 4 for Neural heuristics for SAT solving
Viaarxiv icon