Picture for Baris Kasikci

Baris Kasikci

Magneton: Optimizing Energy Efficiency of ML Systems via Differential Energy Debugging

Add code
Dec 09, 2025
Viaarxiv icon

Fake Runs, Real Fixes -- Analyzing xPU Performance Through Simulation

Add code
Mar 18, 2025
Viaarxiv icon

TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval

Add code
Feb 28, 2025
Figure 1 for TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
Figure 2 for TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
Figure 3 for TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
Figure 4 for TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
Viaarxiv icon

LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation

Add code
Feb 27, 2025
Figure 1 for LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
Figure 2 for LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
Figure 3 for LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
Figure 4 for LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
Viaarxiv icon

Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs

Add code
Feb 17, 2025
Viaarxiv icon

Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models

Add code
Jan 24, 2025
Figure 1 for Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models
Figure 2 for Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models
Figure 3 for Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models
Figure 4 for Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models
Viaarxiv icon

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving

Add code
Jan 02, 2025
Figure 1 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Figure 2 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Figure 3 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Figure 4 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Viaarxiv icon

BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching

Add code
Nov 25, 2024
Figure 1 for BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
Figure 2 for BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
Figure 3 for BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
Figure 4 for BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
Viaarxiv icon

Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Add code
Jun 16, 2024
Figure 1 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Figure 2 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Figure 3 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Figure 4 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Viaarxiv icon

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Add code
Feb 10, 2024
Viaarxiv icon