Picture for Xinpeng Wang

Xinpeng Wang

Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort

Add code
Oct 01, 2025
Viaarxiv icon

Refusal Direction is Universal Across Safety-Aligned Languages

Add code
May 22, 2025
Viaarxiv icon

Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study

Add code
Dec 17, 2024
Viaarxiv icon

Understanding When Tree of Thoughts Succeeds: Larger Models Excel in Generation, Not Discrimination

Add code
Oct 24, 2024
Figure 1 for Understanding When Tree of Thoughts Succeeds: Larger Models Excel in Generation, Not Discrimination
Figure 2 for Understanding When Tree of Thoughts Succeeds: Larger Models Excel in Generation, Not Discrimination
Figure 3 for Understanding When Tree of Thoughts Succeeds: Larger Models Excel in Generation, Not Discrimination
Figure 4 for Understanding When Tree of Thoughts Succeeds: Larger Models Excel in Generation, Not Discrimination
Viaarxiv icon

FedCCRL: Federated Domain Generalization with Cross-Client Representation Learning

Add code
Oct 15, 2024
Figure 1 for FedCCRL: Federated Domain Generalization with Cross-Client Representation Learning
Figure 2 for FedCCRL: Federated Domain Generalization with Cross-Client Representation Learning
Figure 3 for FedCCRL: Federated Domain Generalization with Cross-Client Representation Learning
Figure 4 for FedCCRL: Federated Domain Generalization with Cross-Client Representation Learning
Viaarxiv icon

DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination

Add code
Oct 06, 2024
Figure 1 for DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination
Figure 2 for DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination
Figure 3 for DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination
Figure 4 for DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination
Viaarxiv icon

Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation

Add code
Oct 04, 2024
Viaarxiv icon

"Seeing the Big through the Small": Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?

Add code
Jun 25, 2024
Figure 1 for "Seeing the Big through the Small": Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?
Figure 2 for "Seeing the Big through the Small": Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?
Figure 3 for "Seeing the Big through the Small": Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?
Figure 4 for "Seeing the Big through the Small": Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?
Viaarxiv icon

The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models

Add code
Jun 16, 2024
Figure 1 for The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models
Figure 2 for The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models
Figure 3 for The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models
Figure 4 for The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models
Viaarxiv icon

FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models

Add code
May 28, 2024
Figure 1 for FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models
Figure 2 for FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models
Figure 3 for FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models
Figure 4 for FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models
Viaarxiv icon