Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huimu Yu

Libra: Large Chinese-based Safeguard for AI Content

Jul 29, 2025

Ziyang Chen, Huimu Yu, Xing Wu, Dongqin Liu, Songlin Hu

Figure 1 for Libra: Large Chinese-based Safeguard for AI Content

Figure 2 for Libra: Large Chinese-based Safeguard for AI Content

Figure 3 for Libra: Large Chinese-based Safeguard for AI Content

Figure 4 for Libra: Large Chinese-based Safeguard for AI Content

Abstract:Large language models (LLMs) excel in text understanding and generation but raise significant safety and ethical concerns in high-stakes applications. To mitigate these risks, we present Libra-Guard, a cutting-edge safeguard system designed to enhance the safety of Chinese-based LLMs. Leveraging a two-stage curriculum training pipeline, Libra-Guard enhances data efficiency by employing guard pretraining on synthetic samples, followed by fine-tuning on high-quality, real-world data, thereby significantly reducing reliance on manual annotations. To enable rigorous safety evaluations, we also introduce Libra-Test, the first benchmark specifically designed to evaluate the effectiveness of safeguard systems for Chinese content. It covers seven critical harm scenarios and includes over 5,700 samples annotated by domain experts. Experiments show that Libra-Guard achieves 86.79% accuracy, outperforming Qwen2.5-14B-Instruct (74.33%) and ShieldLM-Qwen-14B-Chat (65.69%), and nearing closed-source models like Claude-3.5-Sonnet and GPT-4o. These contributions establish a robust framework for advancing the safety governance of Chinese LLMs and represent a tentative step toward developing safer, more reliable Chinese AI systems.

Via

Access Paper or Ask Questions

CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning

Oct 03, 2024

Huimu Yu, Xing Wu, Weidong Yin, Debing Zhang, Songlin Hu

Figure 1 for CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning

Figure 2 for CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning

Figure 3 for CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning

Figure 4 for CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning

Abstract:Large language models (LLMs) have made significant progress in natural language understanding and generation, driven by scalable pretraining and advanced finetuning. However, enhancing reasoning abilities in LLMs, particularly via reinforcement learning from human feedback (RLHF), remains challenging due to the scarcity of high-quality preference data, which is labor-intensive to annotate and crucial for reward model (RM) finetuning. To alleviate this issue, we introduce CodePMP, a scalable preference model pretraining (PMP) pipeline that utilizes a large corpus of synthesized code-preference pairs from publicly available high-quality source code. CodePMP improves RM finetuning efficiency by pretraining preference models on large-scale synthesized code-preference pairs. We evaluate CodePMP on mathematical reasoning tasks (GSM8K, MATH) and logical reasoning tasks (ReClor, LogiQA2.0), consistently showing significant improvements in reasoning performance of LLMs and highlighting the importance of scalable preference model pretraining for efficient reward modeling.

* work in progress

Via

Access Paper or Ask Questions