Picture for Zhi-Qin John Xu

Zhi-Qin John Xu

Adaptive Preconditioners Trigger Loss Spikes in Adam

Add code
Jun 05, 2025
Viaarxiv icon

Scalable Complexity Control Facilitates Reasoning Ability of LLMs

Add code
May 29, 2025
Viaarxiv icon

MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models

Add code
May 28, 2025
Viaarxiv icon

An overview of condensation phenomenon in deep learning

Add code
Apr 13, 2025
Viaarxiv icon

An Analysis for Reasoning Bias of Language Models with Small Initialization

Add code
Feb 05, 2025
Viaarxiv icon

Reasoning Bias of Next Token Prediction Training

Add code
Feb 04, 2025
Viaarxiv icon

Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers

Add code
Jan 15, 2025
Figure 1 for Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers
Figure 2 for Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers
Figure 3 for Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers
Figure 4 for Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers
Viaarxiv icon

A rationale from frequency perspective for grokking in training neural network

Add code
May 24, 2024
Figure 1 for A rationale from frequency perspective for grokking in training neural network
Figure 2 for A rationale from frequency perspective for grokking in training neural network
Figure 3 for A rationale from frequency perspective for grokking in training neural network
Figure 4 for A rationale from frequency perspective for grokking in training neural network
Viaarxiv icon

Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation

Add code
May 24, 2024
Figure 1 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 2 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 3 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Figure 4 for Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation
Viaarxiv icon

Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

Add code
May 08, 2024
Viaarxiv icon