Picture for Xinran Gu

Xinran Gu

Kimi K2: Open Agentic Intelligence

Add code
Jul 28, 2025
Figure 1 for Kimi K2: Open Agentic Intelligence
Figure 2 for Kimi K2: Open Agentic Intelligence
Figure 3 for Kimi K2: Open Agentic Intelligence
Figure 4 for Kimi K2: Open Agentic Intelligence
Viaarxiv icon

Data Mixing Can Induce Phase Transitions in Knowledge Acquisition

Add code
May 23, 2025
Viaarxiv icon

Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates

Add code
Feb 28, 2024
Viaarxiv icon

A Quadratic Synchronization Rule for Distributed Deep Learning

Add code
Oct 22, 2023
Figure 1 for A Quadratic Synchronization Rule for Distributed Deep Learning
Figure 2 for A Quadratic Synchronization Rule for Distributed Deep Learning
Figure 3 for A Quadratic Synchronization Rule for Distributed Deep Learning
Figure 4 for A Quadratic Synchronization Rule for Distributed Deep Learning
Viaarxiv icon

Why does Local SGD Generalize Better than SGD?

Add code
Mar 09, 2023
Figure 1 for Why  does Local SGD Generalize Better than SGD?
Figure 2 for Why  does Local SGD Generalize Better than SGD?
Figure 3 for Why  does Local SGD Generalize Better than SGD?
Figure 4 for Why  does Local SGD Generalize Better than SGD?
Viaarxiv icon

Fast Federated Learning in the Presence of Arbitrary Device Unavailability

Add code
Jun 08, 2021
Figure 1 for Fast Federated Learning in the Presence of Arbitrary Device Unavailability
Figure 2 for Fast Federated Learning in the Presence of Arbitrary Device Unavailability
Viaarxiv icon