Picture for Zifan Peng

Zifan Peng

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Add code
Dec 14, 2025
Viaarxiv icon

Source Coverage and Citation Bias in LLM-based vs. Traditional Search Engines

Add code
Dec 10, 2025
Viaarxiv icon

GRPO Privacy Is at Risk: A Membership Inference Attack Against Reinforcement Learning With Verifiable Rewards

Add code
Nov 18, 2025
Figure 1 for GRPO Privacy Is at Risk: A Membership Inference Attack Against Reinforcement Learning With Verifiable Rewards
Figure 2 for GRPO Privacy Is at Risk: A Membership Inference Attack Against Reinforcement Learning With Verifiable Rewards
Figure 3 for GRPO Privacy Is at Risk: A Membership Inference Attack Against Reinforcement Learning With Verifiable Rewards
Figure 4 for GRPO Privacy Is at Risk: A Membership Inference Attack Against Reinforcement Learning With Verifiable Rewards
Viaarxiv icon

ZPD-SCA: Unveiling the Blind Spots of LLMs in Assessing Students' Cognitive Abilities

Add code
Aug 20, 2025
Figure 1 for ZPD-SCA: Unveiling the Blind Spots of LLMs in Assessing Students' Cognitive Abilities
Figure 2 for ZPD-SCA: Unveiling the Blind Spots of LLMs in Assessing Students' Cognitive Abilities
Figure 3 for ZPD-SCA: Unveiling the Blind Spots of LLMs in Assessing Students' Cognitive Abilities
Figure 4 for ZPD-SCA: Unveiling the Blind Spots of LLMs in Assessing Students' Cognitive Abilities
Viaarxiv icon

JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models

Add code
May 23, 2025
Figure 1 for JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models
Figure 2 for JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models
Figure 3 for JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models
Figure 4 for JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models
Viaarxiv icon

"I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments

Add code
May 07, 2025
Figure 1 for "I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments
Figure 2 for "I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments
Figure 3 for "I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments
Figure 4 for "I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments
Viaarxiv icon

Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications

Add code
Apr 30, 2025
Figure 1 for Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications
Figure 2 for Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications
Figure 3 for Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications
Figure 4 for Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications
Viaarxiv icon

Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models

Add code
Apr 18, 2025
Viaarxiv icon

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Add code
Feb 20, 2025
Viaarxiv icon

Automatic Pruning via Structured Lasso with Class-wise Information

Add code
Feb 13, 2025
Viaarxiv icon