Picture for Yutao Sun

Yutao Sun

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

Add code
Jun 10, 2025
Viaarxiv icon

Reinforcement Pre-Training

Add code
Jun 09, 2025
Viaarxiv icon

Rectified Sparse Attention

Add code
Jun 05, 2025
Viaarxiv icon

The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding?

Add code
Feb 19, 2025
Viaarxiv icon

Multimodal Latent Language Modeling with Next-Token Diffusion

Add code
Dec 11, 2024
Figure 1 for Multimodal Latent Language Modeling with Next-Token Diffusion
Figure 2 for Multimodal Latent Language Modeling with Next-Token Diffusion
Figure 3 for Multimodal Latent Language Modeling with Next-Token Diffusion
Figure 4 for Multimodal Latent Language Modeling with Next-Token Diffusion
Viaarxiv icon

Differential Transformer

Add code
Oct 07, 2024
Figure 1 for Differential Transformer
Figure 2 for Differential Transformer
Figure 3 for Differential Transformer
Figure 4 for Differential Transformer
Viaarxiv icon

FocusLLM: Scaling LLM's Context by Parallel Decoding

Add code
Aug 21, 2024
Figure 1 for FocusLLM: Scaling LLM's Context by Parallel Decoding
Figure 2 for FocusLLM: Scaling LLM's Context by Parallel Decoding
Figure 3 for FocusLLM: Scaling LLM's Context by Parallel Decoding
Figure 4 for FocusLLM: Scaling LLM's Context by Parallel Decoding
Viaarxiv icon

Preserving Knowledge in Large Language Model: A Model-Agnostic Self-Decompression Approach

Add code
Jun 17, 2024
Viaarxiv icon

HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation

Add code
Jun 06, 2024
Figure 1 for HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation
Figure 2 for HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation
Figure 3 for HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation
Figure 4 for HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation
Viaarxiv icon

You Only Cache Once: Decoder-Decoder Architectures for Language Models

Add code
May 08, 2024
Viaarxiv icon