Picture for Zhaozhuo Xu

Zhaozhuo Xu

KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches

Add code
Jul 01, 2024
Figure 1 for KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Figure 2 for KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Figure 3 for KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Figure 4 for KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Viaarxiv icon

TorchOpera: A Compound AI System for LLM Safety

Add code
Jun 16, 2024
Viaarxiv icon

Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity

Add code
Jun 05, 2024
Viaarxiv icon

Token-wise Influential Training Data Retrieval for Large Language Models

Add code
May 20, 2024
Figure 1 for Token-wise Influential Training Data Retrieval for Large Language Models
Figure 2 for Token-wise Influential Training Data Retrieval for Large Language Models
Figure 3 for Token-wise Influential Training Data Retrieval for Large Language Models
Figure 4 for Token-wise Influential Training Data Retrieval for Large Language Models
Viaarxiv icon

KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization

Add code
May 07, 2024
Viaarxiv icon

NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention

Add code
Mar 02, 2024
Figure 1 for NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
Figure 2 for NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
Figure 3 for NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
Figure 4 for NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
Viaarxiv icon

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Add code
Feb 05, 2024
Viaarxiv icon

LLM Multi-Agent Systems: Challenges and Open Problems

Add code
Feb 05, 2024
Viaarxiv icon

LETA: Learning Transferable Attribution for Generic Vision Explainer

Add code
Dec 23, 2023
Figure 1 for LETA: Learning Transferable Attribution for Generic Vision Explainer
Figure 2 for LETA: Learning Transferable Attribution for Generic Vision Explainer
Figure 3 for LETA: Learning Transferable Attribution for Generic Vision Explainer
Figure 4 for LETA: Learning Transferable Attribution for Generic Vision Explainer
Viaarxiv icon

Zen: Near-Optimal Sparse Tensor Synchronization for Distributed DNN Training

Add code
Sep 23, 2023
Figure 1 for Zen: Near-Optimal Sparse Tensor Synchronization for Distributed DNN Training
Figure 2 for Zen: Near-Optimal Sparse Tensor Synchronization for Distributed DNN Training
Figure 3 for Zen: Near-Optimal Sparse Tensor Synchronization for Distributed DNN Training
Figure 4 for Zen: Near-Optimal Sparse Tensor Synchronization for Distributed DNN Training
Viaarxiv icon