Picture for Suyeon Jang

Suyeon Jang

T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization

Add code
Nov 17, 2025
Figure 1 for T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization
Figure 2 for T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization
Figure 3 for T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization
Figure 4 for T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization
Viaarxiv icon

QUILL: An Algorithm-Architecture Co-Design for Cache-Local Deformable Attention

Add code
Nov 17, 2025
Figure 1 for QUILL: An Algorithm-Architecture Co-Design for Cache-Local Deformable Attention
Figure 2 for QUILL: An Algorithm-Architecture Co-Design for Cache-Local Deformable Attention
Figure 3 for QUILL: An Algorithm-Architecture Co-Design for Cache-Local Deformable Attention
Figure 4 for QUILL: An Algorithm-Architecture Co-Design for Cache-Local Deformable Attention
Viaarxiv icon