Mobilebert


Efficient Intent-Based Filtering for Multi-Party Conversations Using Knowledge Distillation from LLMs

Add code
Mar 21, 2025
Viaarxiv icon

Resource-Efficient Transformer Architecture: Optimizing Memory and Execution Time for Real-Time Applications

Add code
Dec 25, 2024
Figure 1 for Resource-Efficient Transformer Architecture: Optimizing Memory and Execution Time for Real-Time Applications
Figure 2 for Resource-Efficient Transformer Architecture: Optimizing Memory and Execution Time for Real-Time Applications
Figure 3 for Resource-Efficient Transformer Architecture: Optimizing Memory and Execution Time for Real-Time Applications
Viaarxiv icon

Efficient Deployment of Transformer Models in Analog In-Memory Computing Hardware

Add code
Nov 26, 2024
Figure 1 for Efficient Deployment of Transformer Models in Analog In-Memory Computing Hardware
Figure 2 for Efficient Deployment of Transformer Models in Analog In-Memory Computing Hardware
Figure 3 for Efficient Deployment of Transformer Models in Analog In-Memory Computing Hardware
Figure 4 for Efficient Deployment of Transformer Models in Analog In-Memory Computing Hardware
Viaarxiv icon

On-Device Emoji Classifier Trained with GPT-based Data Augmentation for a Mobile Keyboard

Add code
Nov 06, 2024
Figure 1 for On-Device Emoji Classifier Trained with GPT-based Data Augmentation for a Mobile Keyboard
Figure 2 for On-Device Emoji Classifier Trained with GPT-based Data Augmentation for a Mobile Keyboard
Figure 3 for On-Device Emoji Classifier Trained with GPT-based Data Augmentation for a Mobile Keyboard
Figure 4 for On-Device Emoji Classifier Trained with GPT-based Data Augmentation for a Mobile Keyboard
Viaarxiv icon

On Importance of Pruning and Distillation for Efficient Low Resource NLP

Add code
Sep 21, 2024
Figure 1 for On Importance of Pruning and Distillation for Efficient Low Resource NLP
Figure 2 for On Importance of Pruning and Distillation for Efficient Low Resource NLP
Figure 3 for On Importance of Pruning and Distillation for Efficient Low Resource NLP
Viaarxiv icon

Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

Add code
Aug 05, 2024
Viaarxiv icon

Quantized Transformer Language Model Implementations on Edge Devices

Add code
Oct 06, 2023
Viaarxiv icon

ZipLM: Hardware-Aware Structured Pruning of Language Models

Add code
Feb 07, 2023
Viaarxiv icon

AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models

Add code
Jan 21, 2022
Figure 1 for AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models
Figure 2 for AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models
Figure 3 for AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models
Figure 4 for AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models
Viaarxiv icon

EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation

Add code
Sep 16, 2021
Figure 1 for EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation
Figure 2 for EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation
Figure 3 for EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation
Figure 4 for EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation
Viaarxiv icon