Picture for Jiawei Zhao

Jiawei Zhao

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

Add code
Jan 15, 2026
Viaarxiv icon

The Path Not Taken: RLVR Provably Learns Off the Principals

Add code
Nov 11, 2025
Viaarxiv icon

GaLore 2: Large-Scale LLM Pre-Training by Gradient Low-Rank Projection

Add code
Apr 29, 2025
Figure 1 for GaLore 2: Large-Scale LLM Pre-Training by Gradient Low-Rank Projection
Figure 2 for GaLore 2: Large-Scale LLM Pre-Training by Gradient Low-Rank Projection
Figure 3 for GaLore 2: Large-Scale LLM Pre-Training by Gradient Low-Rank Projection
Figure 4 for GaLore 2: Large-Scale LLM Pre-Training by Gradient Low-Rank Projection
Viaarxiv icon

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

Add code
Feb 18, 2025
Figure 1 for HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Figure 2 for HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Figure 3 for HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Figure 4 for HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Viaarxiv icon

ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization

Add code
Feb 04, 2025
Viaarxiv icon

Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition

Add code
Jan 04, 2025
Figure 1 for Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition
Figure 2 for Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition
Figure 3 for Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition
Figure 4 for Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition
Viaarxiv icon

S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity

Add code
Dec 10, 2024
Figure 1 for S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity
Figure 2 for S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity
Figure 3 for S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity
Figure 4 for S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity
Viaarxiv icon

Focus on BEV: Self-calibrated Cycle View Transformation for Monocular Birds-Eye-View Segmentation

Add code
Oct 21, 2024
Figure 1 for Focus on BEV: Self-calibrated Cycle View Transformation for Monocular Birds-Eye-View Segmentation
Figure 2 for Focus on BEV: Self-calibrated Cycle View Transformation for Monocular Birds-Eye-View Segmentation
Figure 3 for Focus on BEV: Self-calibrated Cycle View Transformation for Monocular Birds-Eye-View Segmentation
Figure 4 for Focus on BEV: Self-calibrated Cycle View Transformation for Monocular Birds-Eye-View Segmentation
Viaarxiv icon

Prefix Guidance: A Steering Wheel for Large Language Models to Defend Against Jailbreak Attacks

Add code
Aug 22, 2024
Viaarxiv icon

MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training

Add code
Jul 22, 2024
Figure 1 for MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training
Figure 2 for MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training
Figure 3 for MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training
Figure 4 for MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training
Viaarxiv icon