Picture for Kan Wu

Kan Wu

Stephen

Spark Transformer: Reactivating Sparsity in FFN and Attention

Add code
Jun 07, 2025
Viaarxiv icon

Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought

Add code
May 21, 2025
Viaarxiv icon

Scaling Laws for Floating Point Quantization Training

Add code
Jan 05, 2025
Figure 1 for Scaling Laws for Floating Point Quantization Training
Figure 2 for Scaling Laws for Floating Point Quantization Training
Figure 3 for Scaling Laws for Floating Point Quantization Training
Figure 4 for Scaling Laws for Floating Point Quantization Training
Viaarxiv icon

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Add code
Nov 05, 2024
Figure 1 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Figure 2 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Figure 3 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Figure 4 for Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Viaarxiv icon

Lossless KV Cache Compression to 2%

Add code
Oct 20, 2024
Viaarxiv icon

Physical design optimization for automated drug dispensing systems in a human-machine interaction environment

Add code
Dec 18, 2023
Viaarxiv icon

A Quick Response Algorithm for Dynamic Autonomous Mobile Robot Routing Problem with Time Windows

Add code
Nov 26, 2023
Figure 1 for A Quick Response Algorithm for Dynamic Autonomous Mobile Robot Routing Problem with Time Windows
Figure 2 for A Quick Response Algorithm for Dynamic Autonomous Mobile Robot Routing Problem with Time Windows
Figure 3 for A Quick Response Algorithm for Dynamic Autonomous Mobile Robot Routing Problem with Time Windows
Figure 4 for A Quick Response Algorithm for Dynamic Autonomous Mobile Robot Routing Problem with Time Windows
Viaarxiv icon

FP8-LM: Training FP8 Large Language Models

Add code
Oct 27, 2023
Figure 1 for FP8-LM: Training FP8 Large Language Models
Figure 2 for FP8-LM: Training FP8 Large Language Models
Figure 3 for FP8-LM: Training FP8 Large Language Models
Figure 4 for FP8-LM: Training FP8 Large Language Models
Viaarxiv icon

TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance

Add code
Sep 21, 2023
Viaarxiv icon

The Multi-trip Autonomous Mobile Robots Scheduling Problem with Time Windows in a Stochastic Environment at Smart Hospitals

Add code
Jul 30, 2023
Figure 1 for The Multi-trip Autonomous Mobile Robots Scheduling Problem with Time Windows in a Stochastic Environment at Smart Hospitals
Figure 2 for The Multi-trip Autonomous Mobile Robots Scheduling Problem with Time Windows in a Stochastic Environment at Smart Hospitals
Figure 3 for The Multi-trip Autonomous Mobile Robots Scheduling Problem with Time Windows in a Stochastic Environment at Smart Hospitals
Figure 4 for The Multi-trip Autonomous Mobile Robots Scheduling Problem with Time Windows in a Stochastic Environment at Smart Hospitals
Viaarxiv icon