Picture for Guangxuan Xiao

Guangxuan Xiao

Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Add code
Jun 16, 2024
Viaarxiv icon

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Add code
May 07, 2024
Figure 1 for QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Figure 2 for QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Figure 3 for QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Figure 4 for QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Viaarxiv icon

Retrieval Head Mechanistically Explains Long-Context Factuality

Add code
Apr 24, 2024
Figure 1 for Retrieval Head Mechanistically Explains Long-Context Factuality
Figure 2 for Retrieval Head Mechanistically Explains Long-Context Factuality
Figure 3 for Retrieval Head Mechanistically Explains Long-Context Factuality
Figure 4 for Retrieval Head Mechanistically Explains Long-Context Factuality
Viaarxiv icon

BitDelta: Your Fine-Tune May Only Be Worth One Bit

Add code
Feb 28, 2024
Viaarxiv icon

InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory

Add code
Feb 07, 2024
Viaarxiv icon

Efficient Streaming Language Models with Attention Sinks

Add code
Sep 29, 2023
Figure 1 for Efficient Streaming Language Models with Attention Sinks
Figure 2 for Efficient Streaming Language Models with Attention Sinks
Figure 3 for Efficient Streaming Language Models with Attention Sinks
Figure 4 for Efficient Streaming Language Models with Attention Sinks
Viaarxiv icon

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

Add code
May 21, 2023
Figure 1 for FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Figure 2 for FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Figure 3 for FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Figure 4 for FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Viaarxiv icon

Sparse and Local Networks for Hypergraph Reasoning

Add code
Mar 09, 2023
Figure 1 for Sparse and Local Networks for Hypergraph Reasoning
Figure 2 for Sparse and Local Networks for Hypergraph Reasoning
Figure 3 for Sparse and Local Networks for Hypergraph Reasoning
Figure 4 for Sparse and Local Networks for Hypergraph Reasoning
Viaarxiv icon

Offsite-Tuning: Transfer Learning without Full Model

Add code
Feb 09, 2023
Figure 1 for Offsite-Tuning: Transfer Learning without Full Model
Figure 2 for Offsite-Tuning: Transfer Learning without Full Model
Figure 3 for Offsite-Tuning: Transfer Learning without Full Model
Figure 4 for Offsite-Tuning: Transfer Learning without Full Model
Viaarxiv icon

ReFresh: Reducing Memory Access from Exploiting Stable Historical Embeddings for Graph Neural Network Training

Add code
Jan 19, 2023
Figure 1 for ReFresh: Reducing Memory Access from Exploiting Stable Historical Embeddings for Graph Neural Network Training
Figure 2 for ReFresh: Reducing Memory Access from Exploiting Stable Historical Embeddings for Graph Neural Network Training
Figure 3 for ReFresh: Reducing Memory Access from Exploiting Stable Historical Embeddings for Graph Neural Network Training
Figure 4 for ReFresh: Reducing Memory Access from Exploiting Stable Historical Embeddings for Graph Neural Network Training
Viaarxiv icon