Picture for Shiwei Liu

Shiwei Liu

A Technical Study into Small Reasoning Language Models

Add code
Jun 16, 2025
Viaarxiv icon

Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution

Add code
May 29, 2025
Viaarxiv icon

NeuroTrails: Training with Dynamic Sparse Heads as the Key to Effective Ensembling

Add code
May 23, 2025
Viaarxiv icon

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

Add code
Feb 27, 2025
Viaarxiv icon

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

Add code
Feb 24, 2025
Figure 1 for Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Figure 2 for Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Figure 3 for Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Figure 4 for Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Viaarxiv icon

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Add code
Feb 11, 2025
Figure 1 for Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
Figure 2 for Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
Figure 3 for Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
Figure 4 for Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
Viaarxiv icon

The Curse of Depth in Large Language Models

Add code
Feb 09, 2025
Viaarxiv icon

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Add code
Jan 12, 2025
Figure 1 for SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Figure 2 for SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Figure 3 for SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Figure 4 for SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Viaarxiv icon

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

Add code
Dec 18, 2024
Viaarxiv icon

SIDE: Socially Informed Drought Estimation Toward Understanding Societal Impact Dynamics of Environmental Crisis

Add code
Dec 17, 2024
Figure 1 for SIDE: Socially Informed Drought Estimation Toward Understanding Societal Impact Dynamics of Environmental Crisis
Figure 2 for SIDE: Socially Informed Drought Estimation Toward Understanding Societal Impact Dynamics of Environmental Crisis
Figure 3 for SIDE: Socially Informed Drought Estimation Toward Understanding Societal Impact Dynamics of Environmental Crisis
Figure 4 for SIDE: Socially Informed Drought Estimation Toward Understanding Societal Impact Dynamics of Environmental Crisis
Viaarxiv icon