Picture for Nitish Joshi

Nitish Joshi

Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort

Add code
Oct 01, 2025
Viaarxiv icon

Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors

Add code
Jun 12, 2025
Figure 1 for Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors
Figure 2 for Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors
Figure 3 for Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors
Figure 4 for Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors
Viaarxiv icon

Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models

Add code
Jun 05, 2025
Viaarxiv icon

Transformers Struggle to Learn to Search

Add code
Dec 06, 2024
Figure 1 for Transformers Struggle to Learn to Search
Figure 2 for Transformers Struggle to Learn to Search
Figure 3 for Transformers Struggle to Learn to Search
Figure 4 for Transformers Struggle to Learn to Search
Viaarxiv icon

LLMs Are Prone to Fallacies in Causal Inference

Add code
Jun 18, 2024
Figure 1 for LLMs Are Prone to Fallacies in Causal Inference
Figure 2 for LLMs Are Prone to Fallacies in Causal Inference
Figure 3 for LLMs Are Prone to Fallacies in Causal Inference
Figure 4 for LLMs Are Prone to Fallacies in Causal Inference
Viaarxiv icon

Personas as a Way to Model Truthfulness in Language Models

Add code
Oct 30, 2023
Figure 1 for Personas as a Way to Model Truthfulness in Language Models
Figure 2 for Personas as a Way to Model Truthfulness in Language Models
Figure 3 for Personas as a Way to Model Truthfulness in Language Models
Figure 4 for Personas as a Way to Model Truthfulness in Language Models
Viaarxiv icon

Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples

Add code
May 24, 2023
Figure 1 for Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples
Figure 2 for Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples
Figure 3 for Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples
Figure 4 for Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples
Viaarxiv icon

Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations

Add code
May 22, 2023
Figure 1 for Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations
Figure 2 for Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations
Figure 3 for Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations
Figure 4 for Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations
Viaarxiv icon

Are All Spurious Features in Natural Language Alike? An Analysis through a Causal Lens

Add code
Oct 25, 2022
Viaarxiv icon

Nuisances via Negativa: Adjusting for Spurious Correlations via Data Augmentation

Add code
Oct 04, 2022
Figure 1 for Nuisances via Negativa: Adjusting for Spurious Correlations via Data Augmentation
Figure 2 for Nuisances via Negativa: Adjusting for Spurious Correlations via Data Augmentation
Figure 3 for Nuisances via Negativa: Adjusting for Spurious Correlations via Data Augmentation
Figure 4 for Nuisances via Negativa: Adjusting for Spurious Correlations via Data Augmentation
Viaarxiv icon