Picture for Hui-Ling Zhen

Hui-Ling Zhen

Benchmarking Post-Training Quantization of Large Language Models under Microscaling Floating Point Formats

Add code
Jan 14, 2026
Viaarxiv icon

SwiftMem: Fast Agentic Memory via Query-aware Indexing

Add code
Jan 13, 2026
Viaarxiv icon

Revisiting Judge Decoding from First Principles via Training-Free Distributional Divergence

Add code
Jan 08, 2026
Viaarxiv icon

What Matters For Safety Alignment?

Add code
Jan 07, 2026
Viaarxiv icon

Towards Efficient Agents: A Co-Design of Inference Architecture and System

Add code
Dec 20, 2025
Viaarxiv icon

SCOPE: Prompt Evolution for Enhancing Agent Effectiveness

Add code
Dec 17, 2025
Figure 1 for SCOPE: Prompt Evolution for Enhancing Agent Effectiveness
Figure 2 for SCOPE: Prompt Evolution for Enhancing Agent Effectiveness
Figure 3 for SCOPE: Prompt Evolution for Enhancing Agent Effectiveness
Figure 4 for SCOPE: Prompt Evolution for Enhancing Agent Effectiveness
Viaarxiv icon

MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling

Add code
Nov 08, 2025
Viaarxiv icon

PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval

Add code
May 23, 2025
Figure 1 for PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval
Figure 2 for PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval
Figure 3 for PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval
Figure 4 for PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval
Viaarxiv icon

TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling

Add code
May 22, 2025
Figure 1 for TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling
Figure 2 for TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling
Figure 3 for TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling
Figure 4 for TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling
Viaarxiv icon

Harnessing On-Device Large Language Model: Empirical Results and Implications for AI PC

Add code
May 22, 2025
Viaarxiv icon