Picture for Zhiming Ma

Zhiming Ma

DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management

Add code
May 19, 2025
Viaarxiv icon

Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning

Add code
Apr 06, 2025
Viaarxiv icon

TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection

Add code
Apr 01, 2025
Viaarxiv icon

SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation

Add code
Feb 12, 2025
Viaarxiv icon

Reveal the Mystery of DPO: The Connection between DPO and RL Algorithms

Add code
Feb 05, 2025
Viaarxiv icon

Language Models as Continuous Self-Evolving Data Engineers

Add code
Dec 19, 2024
Viaarxiv icon

Molecule Joint Auto-Encoding: Trajectory Pretraining with 2D and 3D Diffusion

Add code
Dec 06, 2023
Viaarxiv icon

Elastic Information Bottleneck

Add code
Nov 07, 2023
Viaarxiv icon

Self-supervised Pocket Pretraining via Protein Fragment-Surroundings Alignment

Add code
Oct 11, 2023
Figure 1 for Self-supervised Pocket Pretraining via Protein Fragment-Surroundings Alignment
Figure 2 for Self-supervised Pocket Pretraining via Protein Fragment-Surroundings Alignment
Figure 3 for Self-supervised Pocket Pretraining via Protein Fragment-Surroundings Alignment
Figure 4 for Self-supervised Pocket Pretraining via Protein Fragment-Surroundings Alignment
Viaarxiv icon

Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials

Add code
Jun 15, 2023
Viaarxiv icon