Picture for Zhiming Ma

Zhiming Ma

SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning

Add code
Jan 04, 2026
Viaarxiv icon

HI-TransPA: Hearing Impairments Translation Personal Assistant

Add code
Nov 14, 2025
Viaarxiv icon

DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management

Add code
May 19, 2025
Figure 1 for DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management
Figure 2 for DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management
Figure 3 for DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management
Figure 4 for DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management
Viaarxiv icon

Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning

Add code
Apr 06, 2025
Figure 1 for Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
Figure 2 for Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
Figure 3 for Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
Figure 4 for Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
Viaarxiv icon

TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection

Add code
Apr 01, 2025
Figure 1 for TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection
Figure 2 for TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection
Figure 3 for TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection
Figure 4 for TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection
Viaarxiv icon

SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation

Add code
Feb 12, 2025
Viaarxiv icon

Reveal the Mystery of DPO: The Connection between DPO and RL Algorithms

Add code
Feb 05, 2025
Viaarxiv icon

Language Models as Continuous Self-Evolving Data Engineers

Add code
Dec 19, 2024
Viaarxiv icon

Molecule Joint Auto-Encoding: Trajectory Pretraining with 2D and 3D Diffusion

Add code
Dec 06, 2023
Viaarxiv icon

Elastic Information Bottleneck

Add code
Nov 07, 2023
Viaarxiv icon