Picture for Junyuan Shang

Junyuan Shang

Sparse Growing Transformer: Training-Time Sparse Depth Allocation via Progressive Attention Looping

Add code
Mar 25, 2026
Viaarxiv icon

Mixture of Universal Experts: Scaling Virtual Width via Depth-Width Transformation

Add code
Mar 05, 2026
Viaarxiv icon

ERNIE 5.0 Technical Report

Add code
Feb 04, 2026
Viaarxiv icon

VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction

Add code
Jan 09, 2026
Viaarxiv icon

Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking

Add code
Feb 19, 2025
Figure 1 for Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking
Figure 2 for Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking
Figure 3 for Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking
Figure 4 for Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking
Viaarxiv icon

Mixture of Hidden-Dimensions Transformer

Add code
Dec 10, 2024
Figure 1 for Mixture of Hidden-Dimensions Transformer
Figure 2 for Mixture of Hidden-Dimensions Transformer
Figure 3 for Mixture of Hidden-Dimensions Transformer
Figure 4 for Mixture of Hidden-Dimensions Transformer
Viaarxiv icon

MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

Add code
Oct 17, 2024
Figure 1 for MoR: Mixture of Ranks for Low-Rank Adaptation Tuning
Figure 2 for MoR: Mixture of Ranks for Low-Rank Adaptation Tuning
Figure 3 for MoR: Mixture of Ranks for Low-Rank Adaptation Tuning
Figure 4 for MoR: Mixture of Ranks for Low-Rank Adaptation Tuning
Viaarxiv icon

BiPC: Bidirectional Probability Calibration for Unsupervised Domain Adaption

Add code
Sep 29, 2024
Viaarxiv icon

NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time

Add code
Aug 07, 2024
Figure 1 for NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time
Figure 2 for NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time
Figure 3 for NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time
Figure 4 for NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time
Viaarxiv icon

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion

Add code
Jun 03, 2024
Figure 1 for DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
Figure 2 for DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
Figure 3 for DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
Figure 4 for DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
Viaarxiv icon