Picture for Xiaoying Jia

Xiaoying Jia

Model Merging in Pre-training of Large Language Models

Add code
May 17, 2025
Viaarxiv icon

Seed1.5-VL Technical Report

Add code
May 11, 2025
Viaarxiv icon

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Add code
Feb 23, 2024
Figure 1 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Figure 2 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Figure 3 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Figure 4 for MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Viaarxiv icon

ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Add code
Oct 06, 2022
Figure 1 for ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Figure 2 for ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Figure 3 for ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Figure 4 for ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Viaarxiv icon

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity

Add code
Aug 29, 2020
Figure 1 for Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity
Figure 2 for Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity
Figure 3 for Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity
Figure 4 for Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity
Viaarxiv icon