Picture for Cong Xie

Cong Xie

Cautious Weight Decay

Add code
Oct 14, 2025
Viaarxiv icon

Truncated Proximal Policy Optimization

Add code
Jun 18, 2025
Viaarxiv icon

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production

Add code
May 19, 2025
Viaarxiv icon

SerialGen: Personalized Image Generation by First Standardization Then Personalization

Add code
Dec 02, 2024
Figure 1 for SerialGen: Personalized Image Generation by First Standardization Then Personalization
Figure 2 for SerialGen: Personalized Image Generation by First Standardization Then Personalization
Figure 3 for SerialGen: Personalized Image Generation by First Standardization Then Personalization
Figure 4 for SerialGen: Personalized Image Generation by First Standardization Then Personalization
Viaarxiv icon

Distributed Sign Momentum with Local Steps for Training Transformers

Add code
Nov 26, 2024
Viaarxiv icon

SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training

Add code
Oct 20, 2024
Figure 1 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Figure 2 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Figure 3 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Figure 4 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Viaarxiv icon

MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router

Add code
Oct 15, 2024
Figure 1 for MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Figure 2 for MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Figure 3 for MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Figure 4 for MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Viaarxiv icon

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Add code
Feb 23, 2024
Figure 1 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Figure 2 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Figure 3 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Figure 4 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Viaarxiv icon

Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding

Add code
Jan 28, 2024
Viaarxiv icon

LEMON: Lossless model expansion

Add code
Oct 12, 2023
Figure 1 for LEMON: Lossless model expansion
Figure 2 for LEMON: Lossless model expansion
Figure 3 for LEMON: Lossless model expansion
Figure 4 for LEMON: Lossless model expansion
Viaarxiv icon