Picture for Ben He

Ben He

Rank4Gen: RAG-Preference-Aligned Document Set Selection and Ranking

Add code
Jan 16, 2026
Viaarxiv icon

Coupled Variational Reinforcement Learning for Language Model General Reasoning

Add code
Dec 14, 2025
Viaarxiv icon

Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning

Add code
Apr 01, 2025
Figure 1 for Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning
Figure 2 for Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning
Figure 3 for Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning
Figure 4 for Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning
Viaarxiv icon

Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch

Add code
Feb 24, 2025
Figure 1 for Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch
Figure 2 for Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch
Figure 3 for Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch
Figure 4 for Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch
Viaarxiv icon

SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency

Add code
Feb 04, 2025
Viaarxiv icon

PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides

Add code
Jan 07, 2025
Figure 1 for PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides
Figure 2 for PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides
Figure 3 for PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides
Figure 4 for PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides
Viaarxiv icon

Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models

Add code
Jan 03, 2025
Figure 1 for Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
Figure 2 for Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
Figure 3 for Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
Figure 4 for Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
Viaarxiv icon

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Add code
Nov 18, 2024
Figure 1 for Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Figure 2 for Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Figure 3 for Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Figure 4 for Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Viaarxiv icon

DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models

Add code
Nov 05, 2024
Figure 1 for DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models
Figure 2 for DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models
Figure 3 for DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models
Figure 4 for DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models
Viaarxiv icon

Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?

Add code
Oct 08, 2024
Figure 1 for Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Figure 2 for Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Figure 3 for Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Figure 4 for Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Viaarxiv icon