Picture for Grace Li Zhang

Grace Li Zhang

KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs

Add code
Apr 14, 2026
Viaarxiv icon

OptINC: Optical In-Network-Computing for Scalable Distributed Learning

Add code
Mar 30, 2026
Viaarxiv icon

Late Breaking Results: Conversion of Neural Networks into Logic Flows for Edge Computing

Add code
Jan 29, 2026
Viaarxiv icon

PEL-NAS: Search Space Partitioned Architecture Prompt Co-Evolutionary LLM-driven Hardware-Aware Neural Architecture Search

Add code
Oct 01, 2025
Viaarxiv icon

Revolution or Hype? Seeking the Limits of Large Models in Hardware Design

Add code
Sep 05, 2025
Viaarxiv icon

Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression

Add code
Oct 02, 2024
Figure 1 for Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
Figure 2 for Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
Figure 3 for Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
Figure 4 for Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
Viaarxiv icon

BasisN: Reprogramming-Free RRAM-Based In-Memory-Computing by Basis Combination for Deep Neural Networks

Add code
Jul 04, 2024
Viaarxiv icon

LiveMind: Low-latency Large Language Models with Simultaneous Inference

Add code
Jun 20, 2024
Figure 1 for LiveMind: Low-latency Large Language Models with Simultaneous Inference
Figure 2 for LiveMind: Low-latency Large Language Models with Simultaneous Inference
Figure 3 for LiveMind: Low-latency Large Language Models with Simultaneous Inference
Figure 4 for LiveMind: Low-latency Large Language Models with Simultaneous Inference
Viaarxiv icon

EncodingNet: A Novel Encoding-based MAC Design for Efficient Neural Network Acceleration

Add code
Feb 25, 2024
Viaarxiv icon

Class-Aware Pruning for Efficient Neural Networks

Add code
Dec 10, 2023
Figure 1 for Class-Aware Pruning for Efficient Neural Networks
Figure 2 for Class-Aware Pruning for Efficient Neural Networks
Figure 3 for Class-Aware Pruning for Efficient Neural Networks
Figure 4 for Class-Aware Pruning for Efficient Neural Networks
Viaarxiv icon