Picture for Chatrik Singh Mangat

Chatrik Singh Mangat

From Stability to Inconsistency: A Study of Moral Preferences in LLMs

Add code
Apr 08, 2025
Viaarxiv icon

FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research

Add code
Mar 29, 2025
Figure 1 for FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research
Figure 2 for FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research
Figure 3 for FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research
Figure 4 for FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research
Viaarxiv icon

Characterizing stable regions in the residual stream of LLMs

Add code
Sep 26, 2024
Figure 1 for Characterizing stable regions in the residual stream of LLMs
Figure 2 for Characterizing stable regions in the residual stream of LLMs
Figure 3 for Characterizing stable regions in the residual stream of LLMs
Figure 4 for Characterizing stable regions in the residual stream of LLMs
Viaarxiv icon