Picture for Geonhwa Jeong

Geonhwa Jeong

Accelerating Transformer Inference and Training with 2:4 Activation Sparsity

Add code
Mar 20, 2025
Figure 1 for Accelerating Transformer Inference and Training with 2:4 Activation Sparsity
Figure 2 for Accelerating Transformer Inference and Training with 2:4 Activation Sparsity
Figure 3 for Accelerating Transformer Inference and Training with 2:4 Activation Sparsity
Figure 4 for Accelerating Transformer Inference and Training with 2:4 Activation Sparsity
Viaarxiv icon

SDQ: Sparse Decomposed Quantization for LLM Inference

Add code
Jun 19, 2024
Figure 1 for SDQ: Sparse Decomposed Quantization for LLM Inference
Figure 2 for SDQ: Sparse Decomposed Quantization for LLM Inference
Figure 3 for SDQ: Sparse Decomposed Quantization for LLM Inference
Figure 4 for SDQ: Sparse Decomposed Quantization for LLM Inference
Viaarxiv icon

Demystifying Platform Requirements for Diverse LLM Inference Use Cases

Add code
Jun 03, 2024
Figure 1 for Demystifying Platform Requirements for Diverse LLM Inference Use Cases
Figure 2 for Demystifying Platform Requirements for Diverse LLM Inference Use Cases
Figure 3 for Demystifying Platform Requirements for Diverse LLM Inference Use Cases
Figure 4 for Demystifying Platform Requirements for Diverse LLM Inference Use Cases
Viaarxiv icon

Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition

Add code
Mar 12, 2024
Figure 1 for Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition
Figure 2 for Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition
Figure 3 for Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition
Figure 4 for Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition
Viaarxiv icon

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

Add code
Mar 11, 2024
Figure 1 for GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Figure 2 for GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Figure 3 for GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Figure 4 for GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Viaarxiv icon

Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference

Add code
Mar 08, 2024
Figure 1 for Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference
Figure 2 for Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference
Figure 3 for Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference
Figure 4 for Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference
Viaarxiv icon

VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs

Add code
Feb 23, 2023
Figure 1 for VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs
Figure 2 for VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs
Figure 3 for VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs
Figure 4 for VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs
Viaarxiv icon

RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU

Add code
Oct 05, 2021
Figure 1 for RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU
Figure 2 for RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU
Figure 3 for RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU
Figure 4 for RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU
Viaarxiv icon

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators

Add code
Sep 17, 2021
Figure 1 for Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators
Figure 2 for Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators
Figure 3 for Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators
Figure 4 for Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators
Viaarxiv icon

Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication

Add code
Jun 19, 2021
Figure 1 for Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication
Figure 2 for Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication
Figure 3 for Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication
Figure 4 for Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication
Viaarxiv icon