Picture for Minyi Guo

Minyi Guo

On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference

Add code
May 06, 2026
Viaarxiv icon

CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels

Add code
May 06, 2026
Viaarxiv icon

CoE: Collaborative Entropy for Uncertainty Quantification in Agentic Multi-LLM Systems

Add code
Mar 30, 2026
Viaarxiv icon

DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training

Add code
Jan 29, 2026
Viaarxiv icon

Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding

Add code
Dec 29, 2025
Viaarxiv icon

Harli: SLO-Aware Co-location of LLM Inference and PEFT-based Finetuning on Model-as-a-Service Platforms

Add code
Nov 19, 2025
Viaarxiv icon

MoE-SpeQ: Speculative Quantized Decoding with Proactive Expert Prefetching and Offloading for Mixture-of-Experts

Add code
Nov 18, 2025
Viaarxiv icon

Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution

Add code
Sep 11, 2025
Viaarxiv icon

ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive

Add code
Aug 26, 2025
Figure 1 for ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
Figure 2 for ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
Figure 3 for ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
Figure 4 for ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
Viaarxiv icon

Adacc: Adaptive Compression and Activation Checkpointing for LLM Memory Management

Add code
Aug 01, 2025
Viaarxiv icon