Picture for Yutao Zeng

Yutao Zeng

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Add code
Aug 26, 2025
Viaarxiv icon

Stepsize anything: A unified learning rate schedule for budgeted-iteration training

Add code
May 30, 2025
Viaarxiv icon

Scaling Law for Quantization-Aware Training

Add code
May 20, 2025
Viaarxiv icon

Efficient Pretraining Length Scaling

Add code
Apr 21, 2025
Viaarxiv icon

Frac-Connections: Fractional Extension of Hyper-Connections

Add code
Mar 18, 2025
Viaarxiv icon

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization

Add code
Mar 06, 2025
Viaarxiv icon

Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models

Add code
Feb 21, 2025
Figure 1 for Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
Figure 2 for Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
Figure 3 for Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
Figure 4 for Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
Viaarxiv icon

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models

Add code
Feb 18, 2025
Viaarxiv icon

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Add code
Jan 28, 2025
Viaarxiv icon

Ultra-Sparse Memory Network

Add code
Nov 19, 2024
Viaarxiv icon