Picture for Muling Wu

Muling Wu

Benchmark^2: Systematic Evaluation of LLM Benchmarks

Add code
Jan 07, 2026
Viaarxiv icon

Enhancing Model Privacy in Federated Learning with Random Masking and Quantization

Add code
Aug 27, 2025
Viaarxiv icon

Progressive Mastery: Customized Curriculum Learning with Guided Prompting for Mathematical Reasoning

Add code
Jun 04, 2025
Viaarxiv icon

RECAST: Strengthening LLMs' Complex Instruction Following with Constraint-Verifiable Data

Add code
May 25, 2025
Viaarxiv icon

Improving RL Exploration for LLM Reasoning through Retrospective Replay

Add code
Apr 19, 2025
Viaarxiv icon

Multi-Programming Language Sandbox for LLMs

Add code
Oct 30, 2024
Figure 1 for Multi-Programming Language Sandbox for LLMs
Figure 2 for Multi-Programming Language Sandbox for LLMs
Figure 3 for Multi-Programming Language Sandbox for LLMs
Figure 4 for Multi-Programming Language Sandbox for LLMs
Viaarxiv icon

Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing

Add code
Sep 25, 2024
Figure 1 for Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing
Figure 2 for Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing
Figure 3 for Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing
Figure 4 for Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing
Viaarxiv icon

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

Add code
Jul 08, 2024
Figure 1 for What's Wrong with Your Code Generated by Large Language Models? An Extensive Study
Figure 2 for What's Wrong with Your Code Generated by Large Language Models? An Extensive Study
Figure 3 for What's Wrong with Your Code Generated by Large Language Models? An Extensive Study
Figure 4 for What's Wrong with Your Code Generated by Large Language Models? An Extensive Study
Viaarxiv icon

Towards Biologically Plausible Computing: A Comprehensive Comparison

Add code
Jun 23, 2024
Figure 1 for Towards Biologically Plausible Computing: A Comprehensive Comparison
Figure 2 for Towards Biologically Plausible Computing: A Comprehensive Comparison
Figure 3 for Towards Biologically Plausible Computing: A Comprehensive Comparison
Figure 4 for Towards Biologically Plausible Computing: A Comprehensive Comparison
Viaarxiv icon

Promoting Data and Model Privacy in Federated Learning through Quantized LoRA

Add code
Jun 16, 2024
Viaarxiv icon