Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Silencer: From Discovery to Mitigation of Self-Bias in LLM-as-Benchmark-Generator

May 27, 2025

Peiwen Yuan, Yiwei Li, Shaoxiong Feng, Xinglin Wang, Yueqi Zhang, Jiayi Shi, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li

Figure 1 for Silencer: From Discovery to Mitigation of Self-Bias in LLM-as-Benchmark-Generator

Figure 2 for Silencer: From Discovery to Mitigation of Self-Bias in LLM-as-Benchmark-Generator

Figure 3 for Silencer: From Discovery to Mitigation of Self-Bias in LLM-as-Benchmark-Generator

Figure 4 for Silencer: From Discovery to Mitigation of Self-Bias in LLM-as-Benchmark-Generator

Share this with someone who'll enjoy it:

Abstract:LLM-as-Benchmark-Generator methods have been widely studied as a supplement to human annotators for scalable evaluation, while the potential biases within this paradigm remain underexplored. In this work, we systematically define and validate the phenomenon of inflated performance in models evaluated on their self-generated benchmarks, referred to as self-bias, and attribute it to sub-biases arising from question domain, language style, and wrong labels. On this basis, we propose Silencer, a general framework that leverages the heterogeneity between multiple generators at both the sample and benchmark levels to neutralize bias and generate high-quality, self-bias-silenced benchmark. Experimental results across various settings demonstrate that Silencer can suppress self-bias to near zero, significantly improve evaluation effectiveness of the generated benchmark (with an average improvement from 0.655 to 0.833 in Pearson correlation with high-quality human-annotated benchmark), while also exhibiting strong generalizability.

View paper on

Share this with someone who'll enjoy it:

Title:Silencer: From Discovery to Mitigation of Self-Bias in LLM-as-Benchmark-Generator

Paper and Code