Picture for Xiaonan Nie

Xiaonan Nie

Seedance 1.0: Exploring the Boundaries of Video Generation Models

Add code
Jun 10, 2025
Viaarxiv icon

Emerging Properties in Unified Multimodal Pretraining

Add code
May 20, 2025
Viaarxiv icon

ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs

Add code
Feb 28, 2025
Viaarxiv icon

DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning

Add code
Sep 02, 2024
Figure 1 for DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning
Figure 2 for DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning
Figure 3 for DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning
Figure 4 for DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning
Viaarxiv icon

BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline

Add code
Aug 27, 2024
Figure 1 for BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline
Figure 2 for BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline
Figure 3 for BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline
Figure 4 for BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline
Viaarxiv icon

Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs

Add code
Jul 16, 2024
Figure 1 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Figure 2 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Figure 3 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Figure 4 for Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Viaarxiv icon

Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge

Add code
May 01, 2024
Figure 1 for Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
Figure 2 for Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
Figure 3 for Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
Figure 4 for Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
Viaarxiv icon

Improving Automatic Parallel Training via Balanced Memory Workload Optimization

Add code
Jul 05, 2023
Figure 1 for Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Figure 2 for Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Figure 3 for Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Figure 4 for Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Viaarxiv icon

FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement

Add code
Apr 08, 2023
Viaarxiv icon

Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent

Add code
Mar 06, 2023
Figure 1 for Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
Figure 2 for Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
Figure 3 for Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
Figure 4 for Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
Viaarxiv icon