Picture for Zhongfeng Wang

Zhongfeng Wang

Enable Lightweight and Precision-Scalable Posit/IEEE-754 Arithmetic in RISC-V Cores for Transprecision Computing

Add code
May 25, 2025
Viaarxiv icon

FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization

Add code
May 25, 2025
Viaarxiv icon

FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding

Add code
May 23, 2025
Viaarxiv icon

Parameter-Efficient Fine-Tuning with Circulant and Diagonal Vectors

Add code
May 01, 2025
Viaarxiv icon

Block Circulant Adapter for Large Language Models

Add code
May 01, 2025
Viaarxiv icon

CDM-QTA: Quantized Training Acceleration for Efficient LoRA Fine-Tuning of Diffusion Model

Add code
Apr 08, 2025
Viaarxiv icon

Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format

Add code
Nov 24, 2024
Viaarxiv icon

TaQ-DiT: Time-aware Quantization for Diffusion Transformers

Add code
Nov 21, 2024
Viaarxiv icon

M$^2$-ViT: Accelerating Hybrid Vision Transformers with Two-Level Mixed Quantization

Add code
Oct 10, 2024
Figure 1 for M$^2$-ViT: Accelerating Hybrid Vision Transformers with Two-Level Mixed Quantization
Figure 2 for M$^2$-ViT: Accelerating Hybrid Vision Transformers with Two-Level Mixed Quantization
Figure 3 for M$^2$-ViT: Accelerating Hybrid Vision Transformers with Two-Level Mixed Quantization
Figure 4 for M$^2$-ViT: Accelerating Hybrid Vision Transformers with Two-Level Mixed Quantization
Viaarxiv icon

Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores

Add code
Sep 26, 2024
Viaarxiv icon