Picture for Shizhe Diao

Shizhe Diao

Celine

LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement

Add code
Apr 22, 2025
Viaarxiv icon

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Add code
Apr 17, 2025
Viaarxiv icon

MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving

Add code
Mar 05, 2025
Viaarxiv icon

Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training

Add code
Feb 05, 2025
Viaarxiv icon

Entropy-Regularized Process Reward Model

Add code
Dec 15, 2024
Viaarxiv icon

Hymba: A Hybrid-head Architecture for Small Language Models

Add code
Nov 20, 2024
Figure 1 for Hymba: A Hybrid-head Architecture for Small Language Models
Figure 2 for Hymba: A Hybrid-head Architecture for Small Language Models
Figure 3 for Hymba: A Hybrid-head Architecture for Small Language Models
Figure 4 for Hymba: A Hybrid-head Architecture for Small Language Models
Viaarxiv icon

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models

Add code
Oct 04, 2024
Figure 1 for Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Figure 2 for Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Figure 3 for Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Figure 4 for Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Viaarxiv icon

CodeGraph: Enhancing Graph Reasoning of LLMs with Code

Add code
Aug 25, 2024
Viaarxiv icon

FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation

Add code
Aug 22, 2024
Viaarxiv icon

TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

Add code
Jul 03, 2024
Viaarxiv icon