Picture for Gennady Pekhimenko

Gennady Pekhimenko

DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling

Add code
Sep 03, 2025
Viaarxiv icon

Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving

Add code
Apr 25, 2025
Figure 1 for Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving
Figure 2 for Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving
Figure 3 for Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving
Figure 4 for Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving
Viaarxiv icon

Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis

Add code
Apr 22, 2025
Viaarxiv icon

A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving

Add code
Apr 17, 2025
Figure 1 for A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving
Figure 2 for A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving
Figure 3 for A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving
Figure 4 for A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving
Viaarxiv icon

Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization

Add code
Mar 24, 2025
Viaarxiv icon

Seesaw: High-throughput LLM Inference via Model Re-sharding

Add code
Mar 09, 2025
Viaarxiv icon

APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts

Add code
Jun 19, 2024
Viaarxiv icon

Proteus: Preserving Model Confidentiality during Graph Optimizations

Add code
Apr 18, 2024
Figure 1 for Proteus: Preserving Model Confidentiality during Graph Optimizations
Figure 2 for Proteus: Preserving Model Confidentiality during Graph Optimizations
Figure 3 for Proteus: Preserving Model Confidentiality during Graph Optimizations
Figure 4 for Proteus: Preserving Model Confidentiality during Graph Optimizations
Viaarxiv icon

Accelerating Graph Neural Networks on Real Processing-In-Memory Systems

Add code
Feb 26, 2024
Figure 1 for Accelerating Graph Neural Networks on Real Processing-In-Memory Systems
Figure 2 for Accelerating Graph Neural Networks on Real Processing-In-Memory Systems
Figure 3 for Accelerating Graph Neural Networks on Real Processing-In-Memory Systems
Figure 4 for Accelerating Graph Neural Networks on Real Processing-In-Memory Systems
Viaarxiv icon

The Synergy of Speculative Decoding and Batching in Serving Large Language Models

Add code
Oct 28, 2023
Viaarxiv icon