Sequence Parallelism


Sequence parallelism is a memory-efficient parallelism method to help break input sequence length limitation and train with longer sequences on GPUs efficiently. Sequence parallelism extends tensor-level model parallelism by distributing computing load and activation memory across multiple GPUs along the sequence dimension of transformer layers. This method is particularly useful for portions of the layer that have previously not been parallelized, enhancing overall model performance and efficiency.

Efficient Spatial-Temporal Modeling for Real-Time Video Analysis: A Unified Framework for Action Recognition and Object Tracking

Add code
Jul 30, 2025
Viaarxiv icon

Accelerating Parallel Diffusion Model Serving with Residual Compression

Add code
Jul 23, 2025
Viaarxiv icon

Pixel-Resolved Long-Context Learning for Turbulence at Exascale: Resolving Small-scale Eddies Toward the Viscous Limit

Add code
Jul 22, 2025
Viaarxiv icon

Scaling RL to Long Videos

Add code
Jul 10, 2025
Viaarxiv icon

HyperEvent:Learning Cohesive Events for Large-scale Dynamic Link Prediction

Add code
Jul 16, 2025
Viaarxiv icon

Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential

Add code
Jul 16, 2025
Viaarxiv icon

ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention

Add code
Jul 02, 2025
Viaarxiv icon

RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes

Add code
Jul 03, 2025
Viaarxiv icon

Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models

Add code
Jun 23, 2025
Viaarxiv icon

A Scalable Hybrid Training Approach for Recurrent Spiking Neural Networks

Add code
Jun 17, 2025
Viaarxiv icon