Picture for Kevin Jamieson

Kevin Jamieson

Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning

Add code
Jul 02, 2024
Figure 1 for Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning
Figure 2 for Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning
Figure 3 for Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning
Figure 4 for Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning
Viaarxiv icon

Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning

Add code
Jun 15, 2024
Figure 1 for Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning
Figure 2 for Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning
Figure 3 for Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning
Figure 4 for Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning
Viaarxiv icon

Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning

Add code
Jun 11, 2024
Viaarxiv icon

CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning

Add code
May 29, 2024
Viaarxiv icon

Variance Alignment Score: A Simple But Tough-to-Beat Data Selection Method for Multimodal Contrastive Learning

Add code
Feb 03, 2024
Viaarxiv icon

An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models

Add code
Jan 12, 2024
Figure 1 for An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models
Figure 2 for An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models
Figure 3 for An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models
Figure 4 for An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models
Viaarxiv icon

Fair Active Learning in Low-Data Regimes

Add code
Dec 13, 2023
Viaarxiv icon

Minimax Optimal Submodular Optimization with Bandit Feedback

Add code
Oct 27, 2023
Viaarxiv icon

Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits

Add code
Oct 25, 2023
Figure 1 for Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits
Figure 2 for Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits
Figure 3 for Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits
Figure 4 for Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits
Viaarxiv icon

Optimal Exploration is no harder than Thompson Sampling

Add code
Oct 24, 2023
Figure 1 for Optimal Exploration is no harder than Thompson Sampling
Figure 2 for Optimal Exploration is no harder than Thompson Sampling
Figure 3 for Optimal Exploration is no harder than Thompson Sampling
Figure 4 for Optimal Exploration is no harder than Thompson Sampling
Viaarxiv icon