Picture for Yintao He

Yintao He

TriMoE: Augmenting GPU with AMX-Enabled CPU and DIMM-NDP for High-Throughput MoE Inference via Offloading

Add code
Mar 01, 2026
Viaarxiv icon

PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System

Add code
Feb 21, 2025
Viaarxiv icon

CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators

Add code
Jan 23, 2024
Viaarxiv icon