Picture for Zhijing Jin

Zhijing Jin

Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders

Add code
Nov 13, 2025
Figure 1 for Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders
Figure 2 for Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders
Figure 3 for Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders
Figure 4 for Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders
Viaarxiv icon

Taming Object Hallucinations with Verified Atomic Confidence Estimation

Add code
Nov 12, 2025
Figure 1 for Taming Object Hallucinations with Verified Atomic Confidence Estimation
Figure 2 for Taming Object Hallucinations with Verified Atomic Confidence Estimation
Figure 3 for Taming Object Hallucinations with Verified Atomic Confidence Estimation
Figure 4 for Taming Object Hallucinations with Verified Atomic Confidence Estimation
Viaarxiv icon

SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests

Add code
Oct 06, 2025
Figure 1 for SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
Figure 2 for SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
Figure 3 for SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
Figure 4 for SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
Viaarxiv icon

Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap

Add code
Aug 06, 2025
Figure 1 for Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap
Figure 2 for Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap
Figure 3 for Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap
Figure 4 for Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap
Viaarxiv icon

Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models

Add code
Jun 15, 2025
Viaarxiv icon

Improving Large Language Model Safety with Contrastive Representation Learning

Add code
Jun 13, 2025
Viaarxiv icon

Can Theoretical Physics Research Benefit from Language Agents?

Add code
Jun 06, 2025
Viaarxiv icon

Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness

Add code
May 29, 2025
Figure 1 for Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness
Figure 2 for Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness
Figure 3 for Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness
Figure 4 for Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness
Viaarxiv icon

NLP for Social Good: A Survey of Challenges, Opportunities, and Responsible Deployment

Add code
May 28, 2025
Figure 1 for NLP for Social Good: A Survey of Challenges, Opportunities, and Responsible Deployment
Figure 2 for NLP for Social Good: A Survey of Challenges, Opportunities, and Responsible Deployment
Figure 3 for NLP for Social Good: A Survey of Challenges, Opportunities, and Responsible Deployment
Figure 4 for NLP for Social Good: A Survey of Challenges, Opportunities, and Responsible Deployment
Viaarxiv icon

Are Language Models Consequentialist or Deontological Moral Reasoners?

Add code
May 27, 2025
Viaarxiv icon