Picture for Yikang Shen

Yikang Shen

The infrastructure powering IBM's Gen AI model development

Add code
Jul 07, 2024
Viaarxiv icon

Octo-planner: On-device Language Model for Planner-Action Agents

Add code
Jun 26, 2024
Viaarxiv icon

Efficient Continual Pre-training by Mitigating the Stability Gap

Add code
Jun 21, 2024
Viaarxiv icon

Parallelizing Linear Transformers with the Delta Rule over Sequence Length

Add code
Jun 10, 2024
Viaarxiv icon

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

Add code
May 24, 2024
Viaarxiv icon

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Add code
May 07, 2024
Viaarxiv icon

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Add code
Apr 11, 2024
Figure 1 for JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Figure 2 for JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Figure 3 for JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Figure 4 for JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Viaarxiv icon

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

Add code
Apr 08, 2024
Figure 1 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Figure 2 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Figure 3 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Figure 4 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Viaarxiv icon

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

Add code
Mar 14, 2024
Figure 1 for Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
Figure 2 for Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
Figure 3 for Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
Figure 4 for Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
Viaarxiv icon

Scattered Mixture-of-Experts Implementation

Add code
Mar 13, 2024
Figure 1 for Scattered Mixture-of-Experts Implementation
Figure 2 for Scattered Mixture-of-Experts Implementation
Figure 3 for Scattered Mixture-of-Experts Implementation
Figure 4 for Scattered Mixture-of-Experts Implementation
Viaarxiv icon