Picture for Nikos Karampatziakis

Nikos Karampatziakis

Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning

Add code
Jul 02, 2024
Figure 1 for Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning
Figure 2 for Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning
Figure 3 for Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning
Figure 4 for Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning
Viaarxiv icon

Active, anytime-valid risk controlling prediction sets

Add code
Jun 15, 2024
Viaarxiv icon

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Add code
Apr 23, 2024
Figure 1 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 2 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 3 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 4 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Viaarxiv icon

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

Add code
Oct 23, 2023
Figure 1 for LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Figure 2 for LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Figure 3 for LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Figure 4 for LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Viaarxiv icon

Meet in the Middle: A New Pre-training Paradigm

Add code
Mar 13, 2023
Figure 1 for Meet in the Middle: A New Pre-training Paradigm
Figure 2 for Meet in the Middle: A New Pre-training Paradigm
Figure 3 for Meet in the Middle: A New Pre-training Paradigm
Figure 4 for Meet in the Middle: A New Pre-training Paradigm
Viaarxiv icon

Anytime-valid off-policy inference for contextual bandits

Add code
Oct 19, 2022
Figure 1 for Anytime-valid off-policy inference for contextual bandits
Figure 2 for Anytime-valid off-policy inference for contextual bandits
Figure 3 for Anytime-valid off-policy inference for contextual bandits
Figure 4 for Anytime-valid off-policy inference for contextual bandits
Viaarxiv icon

Contextual Bandit Applications in Customer Support Bot

Add code
Dec 06, 2021
Figure 1 for Contextual Bandit Applications in Customer Support Bot
Figure 2 for Contextual Bandit Applications in Customer Support Bot
Figure 3 for Contextual Bandit Applications in Customer Support Bot
Figure 4 for Contextual Bandit Applications in Customer Support Bot
Viaarxiv icon

Off-policy Confidence Sequences

Add code
Feb 18, 2021
Figure 1 for Off-policy Confidence Sequences
Figure 2 for Off-policy Confidence Sequences
Figure 3 for Off-policy Confidence Sequences
Figure 4 for Off-policy Confidence Sequences
Viaarxiv icon

Empirical Likelihood for Contextual Bandits

Add code
Jun 21, 2019
Figure 1 for Empirical Likelihood for Contextual Bandits
Figure 2 for Empirical Likelihood for Contextual Bandits
Figure 3 for Empirical Likelihood for Contextual Bandits
Figure 4 for Empirical Likelihood for Contextual Bandits
Viaarxiv icon

Lessons from Real-World Reinforcement Learning in a Customer Support Bot

Add code
May 06, 2019
Figure 1 for Lessons from Real-World Reinforcement Learning in a Customer Support Bot
Figure 2 for Lessons from Real-World Reinforcement Learning in a Customer Support Bot
Figure 3 for Lessons from Real-World Reinforcement Learning in a Customer Support Bot
Figure 4 for Lessons from Real-World Reinforcement Learning in a Customer Support Bot
Viaarxiv icon