Picture for Zili Wang

Zili Wang

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Add code
Feb 11, 2026
Viaarxiv icon

Beyond Next-Token Alignment: Distilling Multimodal Large Language Models via Token Interactions

Add code
Feb 10, 2026
Viaarxiv icon

CodeSimpleQA: Scaling Factuality in Code Large Language Models

Add code
Dec 22, 2025
Viaarxiv icon

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

Add code
Nov 13, 2025
Figure 1 for MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Figure 2 for MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Figure 3 for MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Figure 4 for MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Viaarxiv icon

Neurosymbolic Feature Extraction for Identifying Forced Labor in Supply Chains

Add code
Jul 09, 2025
Viaarxiv icon

Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling

Add code
Jul 02, 2025
Viaarxiv icon

Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?

Add code
Jun 13, 2025
Viaarxiv icon

Farseer: A Refined Scaling Law in Large Language Models

Add code
Jun 12, 2025
Figure 1 for Farseer: A Refined Scaling Law in Large Language Models
Figure 2 for Farseer: A Refined Scaling Law in Large Language Models
Figure 3 for Farseer: A Refined Scaling Law in Large Language Models
Figure 4 for Farseer: A Refined Scaling Law in Large Language Models
Viaarxiv icon

Faster and Better LLMs via Latency-Aware Test-Time Scaling

Add code
May 26, 2025
Viaarxiv icon

Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining

Add code
Mar 06, 2025
Viaarxiv icon