Picture for Rameswar Panda

Rameswar Panda

Richard

Scaling Granite Code Models to 128K Context

Add code
Jul 18, 2024
Viaarxiv icon

The infrastructure powering IBM's Gen AI model development

Add code
Jul 07, 2024
Viaarxiv icon

Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks

Add code
Jun 27, 2024
Figure 1 for Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
Figure 2 for Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
Figure 3 for Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
Figure 4 for Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
Viaarxiv icon

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Add code
Jun 17, 2024
Viaarxiv icon

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Add code
May 21, 2024
Figure 1 for Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Figure 2 for Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Figure 3 for Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Figure 4 for Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Viaarxiv icon

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Add code
May 07, 2024
Figure 1 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Figure 2 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Figure 3 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Figure 4 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Viaarxiv icon

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

Add code
Apr 08, 2024
Figure 1 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Figure 2 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Figure 3 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Figure 4 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Viaarxiv icon

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

Add code
Apr 04, 2024
Figure 1 for Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Figure 2 for Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Figure 3 for Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Figure 4 for Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Viaarxiv icon

Scattered Mixture-of-Experts Implementation

Add code
Mar 13, 2024
Figure 1 for Scattered Mixture-of-Experts Implementation
Figure 2 for Scattered Mixture-of-Experts Implementation
Figure 3 for Scattered Mixture-of-Experts Implementation
Figure 4 for Scattered Mixture-of-Experts Implementation
Viaarxiv icon

API Pack: A Massive Multilingual Dataset for API Call Generation

Add code
Feb 16, 2024
Figure 1 for API Pack: A Massive Multilingual Dataset for API Call Generation
Figure 2 for API Pack: A Massive Multilingual Dataset for API Call Generation
Figure 3 for API Pack: A Massive Multilingual Dataset for API Call Generation
Figure 4 for API Pack: A Massive Multilingual Dataset for API Call Generation
Viaarxiv icon