Picture for Mohammad Rastegari

Mohammad Rastegari

KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation

Add code
May 08, 2024
Figure 1 for KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
Figure 2 for KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
Figure 3 for KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
Figure 4 for KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
Viaarxiv icon

OpenELM: An Efficient Language Model Family with Open Training and Inference Framework

Add code
May 02, 2024
Figure 1 for OpenELM: An Efficient Language Model Family with Open Training and Inference Framework
Figure 2 for OpenELM: An Efficient Language Model Family with Open Training and Inference Framework
Figure 3 for OpenELM: An Efficient Language Model Family with Open Training and Inference Framework
Figure 4 for OpenELM: An Efficient Language Model Family with Open Training and Inference Framework
Viaarxiv icon

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Add code
Apr 24, 2024
Viaarxiv icon

Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation

Add code
Apr 10, 2024
Viaarxiv icon

Speculative Streaming: Fast LLM Inference without Auxiliary Models

Add code
Feb 16, 2024
Viaarxiv icon

Weight subcloning: direct initialization of transformers using larger pretrained ones

Add code
Dec 14, 2023
Figure 1 for Weight subcloning: direct initialization of transformers using larger pretrained ones
Figure 2 for Weight subcloning: direct initialization of transformers using larger pretrained ones
Figure 3 for Weight subcloning: direct initialization of transformers using larger pretrained ones
Figure 4 for Weight subcloning: direct initialization of transformers using larger pretrained ones
Viaarxiv icon

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Add code
Dec 12, 2023
Figure 1 for LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Figure 2 for LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Figure 3 for LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Figure 4 for LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Viaarxiv icon

Label-efficient Training of Small Task-specific Models by Leveraging Vision Foundation Models

Add code
Nov 30, 2023
Figure 1 for Label-efficient Training of Small Task-specific Models by Leveraging Vision Foundation Models
Figure 2 for Label-efficient Training of Small Task-specific Models by Leveraging Vision Foundation Models
Figure 3 for Label-efficient Training of Small Task-specific Models by Leveraging Vision Foundation Models
Figure 4 for Label-efficient Training of Small Task-specific Models by Leveraging Vision Foundation Models
Viaarxiv icon

SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

Add code
Oct 23, 2023
Figure 1 for SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Figure 2 for SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Figure 3 for SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Figure 4 for SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Viaarxiv icon

CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement

Add code
Oct 21, 2023
Figure 1 for CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement
Figure 2 for CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement
Figure 3 for CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement
Figure 4 for CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement
Viaarxiv icon