Picture for Zihao Zeng

Zihao Zeng

Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions

Add code
May 26, 2025
Viaarxiv icon

Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition

Add code
May 26, 2025
Viaarxiv icon

RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability

Add code
Apr 14, 2025
Viaarxiv icon

SIFT: Grounding LLM Reasoning in Contexts via Stickers

Add code
Feb 19, 2025
Viaarxiv icon

ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference

Add code
Oct 23, 2024
Figure 1 for ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Figure 2 for ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Figure 3 for ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Figure 4 for ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Viaarxiv icon

MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection

Add code
Oct 16, 2024
Figure 1 for MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
Figure 2 for MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
Figure 3 for MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
Figure 4 for MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
Viaarxiv icon

In-context KV-Cache Eviction for LLMs via Attention-Gate

Add code
Oct 15, 2024
Figure 1 for In-context KV-Cache Eviction for LLMs via Attention-Gate
Figure 2 for In-context KV-Cache Eviction for LLMs via Attention-Gate
Figure 3 for In-context KV-Cache Eviction for LLMs via Attention-Gate
Figure 4 for In-context KV-Cache Eviction for LLMs via Attention-Gate
Viaarxiv icon

AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models

Add code
Jun 19, 2024
Figure 1 for AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models
Figure 2 for AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models
Figure 3 for AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models
Figure 4 for AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models
Viaarxiv icon