Picture for Amir Yazdanbakhsh

Amir Yazdanbakhsh

Celine

OPTIMA: Optimal One-shot Pruning for LLMs via Quadratic Programming Reconstruction

Add code
Dec 15, 2025
Figure 1 for OPTIMA: Optimal One-shot Pruning for LLMs via Quadratic Programming Reconstruction
Figure 2 for OPTIMA: Optimal One-shot Pruning for LLMs via Quadratic Programming Reconstruction
Figure 3 for OPTIMA: Optimal One-shot Pruning for LLMs via Quadratic Programming Reconstruction
Figure 4 for OPTIMA: Optimal One-shot Pruning for LLMs via Quadratic Programming Reconstruction
Viaarxiv icon

SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?

Add code
Nov 11, 2025
Figure 1 for SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?
Figure 2 for SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?
Figure 3 for SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?
Figure 4 for SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?
Viaarxiv icon

Understanding and Optimizing Multi-Stage AI Inference Pipelines

Add code
Apr 16, 2025
Viaarxiv icon

Beyond Moore's Law: Harnessing the Redshift of Generative AI with Effective Hardware-Software Co-Design

Add code
Apr 09, 2025
Viaarxiv icon

Concorde: Fast and Accurate CPU Performance Modeling with Compositional Analytical-ML Fusion

Add code
Mar 29, 2025
Viaarxiv icon

RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving

Add code
Mar 21, 2025
Figure 1 for RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving
Figure 2 for RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving
Figure 3 for RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving
Figure 4 for RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving
Viaarxiv icon

Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding

Add code
Feb 17, 2025
Figure 1 for Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
Figure 2 for Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
Figure 3 for Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
Figure 4 for Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
Viaarxiv icon

QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture

Add code
Jan 06, 2025
Figure 1 for QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture
Figure 2 for QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture
Figure 3 for QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture
Figure 4 for QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture
Viaarxiv icon

CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming

Add code
Oct 27, 2024
Figure 1 for CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming
Figure 2 for CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming
Figure 3 for CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming
Figure 4 for CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming
Viaarxiv icon

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Add code
Jun 11, 2024
Figure 1 for ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Figure 2 for ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Figure 3 for ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Figure 4 for ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Viaarxiv icon