Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yingying Cheng

Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training

May 27, 2026

Yingying Cheng, Jinquan Shi, Li Zhou, Zhiyang He, Zhaoyi Sun, Fan Zhang, Jie Sun

Abstract:Quantization-aware training (QAT) with low-bit floating-point formats enables efficient LLM deployment, yet introduces subtle failure modes invisible to standard training metrics. We present a systematic study of HiF8 W8A8 QAT for OpenPangu-Embedded-1B through the lens of Delayed Tensor Scaling (DTS). Across eight controlled experiments, we identify and disentangle two orthogonal failure modes: (i)amax saturation, where delayed scale estimates silently corrupt knowledge-sensitive representations via forward-pass clipping, and (ii)catastrophic forgetting, where an aggressive learning rate overwrites pretrained commonsense knowledge independently of quantization. Neither is detectable from training loss alone. We address amax saturation with a conservative max-algorithm DTS strategy over a 64-step history window, and mitigate forgetting via a 500-step BF16 warmup followed by QAT at lr=10^{-5}. Both fixes are necessary and sufficient: our final configuration achieves 0.43% MMLU drop, 0.58% HellaSwag drop, and 0.22% ARC-Challenge drop versus a matched BF16 baseline, with a training loss APE of only 0.11% over 10,000 steps.

Via

Access Paper or Ask Questions

GridCodex: A RAG-Driven AI Framework for Power Grid Code Reasoning and Compliance

Aug 18, 2025

Jinquan Shi, Yingying Cheng, Fan Zhang, Miao Jiang, Jun Lin, Yanbai Shen

Figure 1 for GridCodex: A RAG-Driven AI Framework for Power Grid Code Reasoning and Compliance

Figure 2 for GridCodex: A RAG-Driven AI Framework for Power Grid Code Reasoning and Compliance

Figure 3 for GridCodex: A RAG-Driven AI Framework for Power Grid Code Reasoning and Compliance

Figure 4 for GridCodex: A RAG-Driven AI Framework for Power Grid Code Reasoning and Compliance

Abstract:The global shift towards renewable energy presents unprecedented challenges for the electricity industry, making regulatory reasoning and compliance increasingly vital. Grid codes, the regulations governing grid operations, are complex and often lack automated interpretation solutions, which hinders industry expansion and undermines profitability for electricity companies. We introduce GridCodex, an end to end framework for grid code reasoning and compliance that leverages large language models and retrieval-augmented generation (RAG). Our framework advances conventional RAG workflows through multi stage query refinement and enhanced retrieval with RAPTOR. We validate the effectiveness of GridCodex with comprehensive benchmarks, including automated answer assessment across multiple dimensions and regulatory agencies. Experimental results showcase a 26.4% improvement in answer quality and more than a 10 fold increase in recall rate. An ablation study further examines the impact of base model selection.

Via

Access Paper or Ask Questions