Picture for Xuehai Qian

Xuehai Qian

Mesh-Attention: A New Communication-Efficient Distributed Attention with Improved Data Locality

Add code
Dec 24, 2025
Viaarxiv icon

GreedySnake: Accelerating SSD-Offloaded LLM Training with Efficient Scheduling and Optimizer Step Overlapping

Add code
Dec 19, 2025
Figure 1 for GreedySnake: Accelerating SSD-Offloaded LLM Training with Efficient Scheduling and Optimizer Step Overlapping
Figure 2 for GreedySnake: Accelerating SSD-Offloaded LLM Training with Efficient Scheduling and Optimizer Step Overlapping
Figure 3 for GreedySnake: Accelerating SSD-Offloaded LLM Training with Efficient Scheduling and Optimizer Step Overlapping
Figure 4 for GreedySnake: Accelerating SSD-Offloaded LLM Training with Efficient Scheduling and Optimizer Step Overlapping
Viaarxiv icon

Fine-Grained Embedding Dimension Optimization During Training for Recommender Systems

Add code
Jan 09, 2024
Figure 1 for Fine-Grained Embedding Dimension Optimization During Training for Recommender Systems
Figure 2 for Fine-Grained Embedding Dimension Optimization During Training for Recommender Systems
Figure 3 for Fine-Grained Embedding Dimension Optimization During Training for Recommender Systems
Figure 4 for Fine-Grained Embedding Dimension Optimization During Training for Recommender Systems
Viaarxiv icon

RobustState: Boosting Fidelity of Quantum State Preparation via Noise-Aware Variational Training

Add code
Nov 27, 2023
Figure 1 for RobustState: Boosting Fidelity of Quantum State Preparation via Noise-Aware Variational Training
Figure 2 for RobustState: Boosting Fidelity of Quantum State Preparation via Noise-Aware Variational Training
Figure 3 for RobustState: Boosting Fidelity of Quantum State Preparation via Noise-Aware Variational Training
Figure 4 for RobustState: Boosting Fidelity of Quantum State Preparation via Noise-Aware Variational Training
Viaarxiv icon

GNNPipe: Accelerating Distributed Full-Graph GNN Training with Pipelined Model Parallelism

Add code
Aug 19, 2023
Viaarxiv icon

QuEst: Graph Transformer for Quantum Circuit Reliability Estimation

Add code
Oct 30, 2022
Viaarxiv icon

PAN: Pulse Ansatz on NISQ Machines

Add code
Aug 02, 2022
Figure 1 for PAN: Pulse Ansatz on NISQ Machines
Figure 2 for PAN: Pulse Ansatz on NISQ Machines
Figure 3 for PAN: Pulse Ansatz on NISQ Machines
Figure 4 for PAN: Pulse Ansatz on NISQ Machines
Viaarxiv icon

GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity

Add code
Aug 25, 2021
Figure 1 for GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity
Figure 2 for GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity
Figure 3 for GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity
Figure 4 for GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity
Viaarxiv icon

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

Add code
Jun 16, 2021
Figure 1 for FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator
Figure 2 for FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator
Figure 3 for FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator
Figure 4 for FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator
Viaarxiv icon

HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation

Add code
May 04, 2021
Figure 1 for HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation
Figure 2 for HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation
Figure 3 for HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation
Figure 4 for HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation
Viaarxiv icon