Picture for Beiwen Zhang

Beiwen Zhang

Power-Law Decay Loss for Large Language Model Finetuning: Focusing on Information Sparsity to Enhance Generation Quality

Add code
May 22, 2025
Viaarxiv icon

ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention

Add code
May 15, 2025
Viaarxiv icon