Picture for Shaobo Ma

Shaobo Ma

APT-LLM: Exploiting Arbitrary-Precision Tensor Core Computing for LLM Acceleration

Add code
Aug 26, 2025
Viaarxiv icon

FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization

Add code
May 25, 2025
Figure 1 for FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization
Figure 2 for FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization
Figure 3 for FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization
Figure 4 for FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization
Viaarxiv icon

FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding

Add code
May 23, 2025
Figure 1 for FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding
Figure 2 for FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding
Figure 3 for FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding
Figure 4 for FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding
Viaarxiv icon

Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores

Add code
Sep 26, 2024
Figure 1 for Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
Figure 2 for Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
Figure 3 for Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
Figure 4 for Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
Viaarxiv icon

Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment

Add code
Jul 16, 2024
Viaarxiv icon