Picture for Hiteshi Sharma

Hiteshi Sharma

Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle

Add code
Jul 18, 2024
Viaarxiv icon

Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning

Add code
Jul 02, 2024
Figure 1 for Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning
Figure 2 for Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning
Figure 3 for Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning
Figure 4 for Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning
Viaarxiv icon

Self-Exploring Language Models: Active Preference Elicitation for Online Alignment

Add code
May 29, 2024
Viaarxiv icon

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Add code
Apr 23, 2024
Figure 1 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 2 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 3 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 4 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Viaarxiv icon

Language Models can be Logical Solvers

Add code
Nov 10, 2023
Figure 1 for Language Models can be Logical Solvers
Figure 2 for Language Models can be Logical Solvers
Figure 3 for Language Models can be Logical Solvers
Figure 4 for Language Models can be Logical Solvers
Viaarxiv icon

ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning

Add code
Sep 27, 2023
Figure 1 for ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning
Figure 2 for ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning
Figure 3 for ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning
Figure 4 for ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning
Viaarxiv icon

Evaluating Cognitive Maps and Planning in Large Language Models with CogEval

Add code
Sep 25, 2023
Viaarxiv icon

Fine-Tuning Language Models with Advantage-Induced Policy Alignment

Add code
Jun 06, 2023
Figure 1 for Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Figure 2 for Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Figure 3 for Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Figure 4 for Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Viaarxiv icon

Randomized Policy Learning for Continuous State and Action MDPs

Add code
Jun 08, 2020
Figure 1 for Randomized Policy Learning for Continuous State and Action MDPs
Figure 2 for Randomized Policy Learning for Continuous State and Action MDPs
Viaarxiv icon

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Add code
Oct 15, 2019
Figure 1 for Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
Figure 2 for Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
Viaarxiv icon