Picture for Rameswar Panda

Rameswar Panda

Richard

The infrastructure powering IBM's Gen AI model development

Add code
Jul 07, 2024
Figure 1 for The infrastructure powering IBM's Gen AI model development
Figure 2 for The infrastructure powering IBM's Gen AI model development
Figure 3 for The infrastructure powering IBM's Gen AI model development
Figure 4 for The infrastructure powering IBM's Gen AI model development
Viaarxiv icon

Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks

Add code
Jun 27, 2024
Figure 1 for Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
Figure 2 for Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
Figure 3 for Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
Figure 4 for Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
Viaarxiv icon

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Add code
Jun 17, 2024
Viaarxiv icon

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Add code
May 21, 2024
Viaarxiv icon

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Add code
May 07, 2024
Figure 1 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Figure 2 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Figure 3 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Figure 4 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Viaarxiv icon

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

Add code
Apr 08, 2024
Figure 1 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Figure 2 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Figure 3 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Figure 4 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Viaarxiv icon

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

Add code
Apr 04, 2024
Figure 1 for Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Figure 2 for Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Figure 3 for Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Figure 4 for Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Viaarxiv icon

Scattered Mixture-of-Experts Implementation

Add code
Mar 13, 2024
Viaarxiv icon

API Pack: A Massive Multilingual Dataset for API Call Generation

Add code
Feb 16, 2024
Viaarxiv icon

Data Engineering for Scaling Language Models to 128K Context

Add code
Feb 15, 2024
Viaarxiv icon