Picture for Zhuoran Jin

Zhuoran Jin

Unlocking the Future: Exploring Look-Ahead Planning Mechanistic Interpretability in Large Language Models

Add code
Jun 23, 2024
Figure 1 for Unlocking the Future: Exploring Look-Ahead Planning Mechanistic Interpretability in Large Language Models
Figure 2 for Unlocking the Future: Exploring Look-Ahead Planning Mechanistic Interpretability in Large Language Models
Figure 3 for Unlocking the Future: Exploring Look-Ahead Planning Mechanistic Interpretability in Large Language Models
Figure 4 for Unlocking the Future: Exploring Look-Ahead Planning Mechanistic Interpretability in Large Language Models
Viaarxiv icon

Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models

Add code
Jun 18, 2024
Viaarxiv icon

RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models

Add code
Jun 16, 2024
Figure 1 for RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models
Figure 2 for RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models
Figure 3 for RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models
Figure 4 for RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models
Viaarxiv icon

SimuCourt: Building Judicial Decision-Making Agents with Real-world Judgement Documents

Add code
Mar 05, 2024
Figure 1 for SimuCourt: Building Judicial Decision-Making Agents with Real-world Judgement Documents
Figure 2 for SimuCourt: Building Judicial Decision-Making Agents with Real-world Judgement Documents
Figure 3 for SimuCourt: Building Judicial Decision-Making Agents with Real-world Judgement Documents
Figure 4 for SimuCourt: Building Judicial Decision-Making Agents with Real-world Judgement Documents
Viaarxiv icon

Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models

Add code
Feb 29, 2024
Viaarxiv icon

Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning

Add code
Feb 28, 2024
Viaarxiv icon

Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and Mitigating Knowledge Conflicts in Language Models

Add code
Feb 28, 2024
Viaarxiv icon

Tug-of-War Between Knowledge: Exploring and Resolving Knowledge Conflicts in Retrieval-Augmented Language Models

Add code
Feb 22, 2024
Figure 1 for Tug-of-War Between Knowledge: Exploring and Resolving Knowledge Conflicts in Retrieval-Augmented Language Models
Figure 2 for Tug-of-War Between Knowledge: Exploring and Resolving Knowledge Conflicts in Retrieval-Augmented Language Models
Figure 3 for Tug-of-War Between Knowledge: Exploring and Resolving Knowledge Conflicts in Retrieval-Augmented Language Models
Figure 4 for Tug-of-War Between Knowledge: Exploring and Resolving Knowledge Conflicts in Retrieval-Augmented Language Models
Viaarxiv icon