Picture for Chien-Yu Lin

Chien-Yu Lin

Hybrid Architectures for Language Models: Systematic Analysis and Design Insights

Add code
Oct 06, 2025
Viaarxiv icon

xKV: Cross-Layer SVD for KV-Cache Compression

Add code
Mar 24, 2025
Viaarxiv icon

TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval

Add code
Feb 28, 2025
Viaarxiv icon

Palu: Compressing KV-Cache with Low-Rank Projection

Add code
Jul 30, 2024
Figure 1 for Palu: Compressing KV-Cache with Low-Rank Projection
Figure 2 for Palu: Compressing KV-Cache with Low-Rank Projection
Figure 3 for Palu: Compressing KV-Cache with Low-Rank Projection
Figure 4 for Palu: Compressing KV-Cache with Low-Rank Projection
Viaarxiv icon

Encode Once and Decode in Parallel: Efficient Transformer Decoding

Add code
Mar 19, 2024
Figure 1 for Encode Once and Decode in Parallel: Efficient Transformer Decoding
Figure 2 for Encode Once and Decode in Parallel: Efficient Transformer Decoding
Figure 3 for Encode Once and Decode in Parallel: Efficient Transformer Decoding
Figure 4 for Encode Once and Decode in Parallel: Efficient Transformer Decoding
Viaarxiv icon

FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple Super-Resolution Pipeline

Add code
Dec 20, 2023
Figure 1 for FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple Super-Resolution Pipeline
Figure 2 for FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple Super-Resolution Pipeline
Figure 3 for FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple Super-Resolution Pipeline
Figure 4 for FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple Super-Resolution Pipeline
Viaarxiv icon

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Add code
Nov 07, 2023
Figure 1 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 2 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 3 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 4 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Viaarxiv icon

SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks

Add code
Jul 21, 2022
Figure 1 for SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks
Figure 2 for SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks
Figure 3 for SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks
Figure 4 for SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks
Viaarxiv icon

Comprehensive and Clinically Accurate Head and Neck Organs at Risk Delineation via Stratified Deep Learning: A Large-scale Multi-Institutional Study

Add code
Nov 01, 2021
Figure 1 for Comprehensive and Clinically Accurate Head and Neck Organs at Risk Delineation via Stratified Deep Learning: A Large-scale Multi-Institutional Study
Figure 2 for Comprehensive and Clinically Accurate Head and Neck Organs at Risk Delineation via Stratified Deep Learning: A Large-scale Multi-Institutional Study
Figure 3 for Comprehensive and Clinically Accurate Head and Neck Organs at Risk Delineation via Stratified Deep Learning: A Large-scale Multi-Institutional Study
Figure 4 for Comprehensive and Clinically Accurate Head and Neck Organs at Risk Delineation via Stratified Deep Learning: A Large-scale Multi-Institutional Study
Viaarxiv icon

Accelerating SpMM Kernel with Cache-First Edge Sampling for Graph Neural Networks

Add code
Apr 23, 2021
Figure 1 for Accelerating SpMM Kernel with Cache-First Edge Sampling for Graph Neural Networks
Figure 2 for Accelerating SpMM Kernel with Cache-First Edge Sampling for Graph Neural Networks
Figure 3 for Accelerating SpMM Kernel with Cache-First Edge Sampling for Graph Neural Networks
Figure 4 for Accelerating SpMM Kernel with Cache-First Edge Sampling for Graph Neural Networks
Viaarxiv icon