Alert button
Picture for Sainbayar Sukhbaatar

Sainbayar Sukhbaatar

Alert button

Self-Rewarding Language Models

Jan 18, 2024
Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Sainbayar Sukhbaatar, Jing Xu, Jason Weston

Viaarxiv icon

Some things are more CRINGE than others: Preference Optimization with the Pairwise Cringe Loss

Dec 27, 2023
Jing Xu, Andrew Lee, Sainbayar Sukhbaatar, Jason Weston

Viaarxiv icon

System 2 Attention (is something you might need too)

Nov 20, 2023
Jason Weston, Sainbayar Sukhbaatar

Viaarxiv icon

A Data Source for Reasoning Embodied Agents

Sep 14, 2023
Jack Lanchantin, Sainbayar Sukhbaatar, Gabriel Synnaeve, Yuxuan Sun, Kavya Srinet, Arthur Szlam

Figure 1 for A Data Source for Reasoning Embodied Agents
Figure 2 for A Data Source for Reasoning Embodied Agents
Figure 3 for A Data Source for Reasoning Embodied Agents
Figure 4 for A Data Source for Reasoning Embodied Agents
Viaarxiv icon

Improving Open Language Models by Learning from Organic Interactions

Jun 07, 2023
Jing Xu, Da Ju, Joshua Lane, Mojtaba Komeili, Eric Michael Smith, Megan Ung, Morteza Behrooz, William Ngan, Rashel Moritz, Sainbayar Sukhbaatar, Y-Lan Boureau, Jason Weston, Kurt Shuster

Figure 1 for Improving Open Language Models by Learning from Organic Interactions
Figure 2 for Improving Open Language Models by Learning from Organic Interactions
Figure 3 for Improving Open Language Models by Learning from Organic Interactions
Figure 4 for Improving Open Language Models by Learning from Organic Interactions
Viaarxiv icon

Large Language Model Programs

May 09, 2023
Imanol Schlag, Sainbayar Sukhbaatar, Asli Celikyilmaz, Wen-tau Yih, Jason Weston, Jürgen Schmidhuber, Xian Li

Figure 1 for Large Language Model Programs
Figure 2 for Large Language Model Programs
Figure 3 for Large Language Model Programs
Figure 4 for Large Language Model Programs
Viaarxiv icon

Learning to Reason and Memorize with Self-Notes

May 01, 2023
Jack Lanchantin, Shubham Toshniwal, Jason Weston, Arthur Szlam, Sainbayar Sukhbaatar

Figure 1 for Learning to Reason and Memorize with Self-Notes
Figure 2 for Learning to Reason and Memorize with Self-Notes
Figure 3 for Learning to Reason and Memorize with Self-Notes
Figure 4 for Learning to Reason and Memorize with Self-Notes
Viaarxiv icon

Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions

Apr 18, 2023
Lina Mezghani, Piotr Bojanowski, Karteek Alahari, Sainbayar Sukhbaatar

Figure 1 for Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions
Figure 2 for Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions
Figure 3 for Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions
Figure 4 for Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions
Viaarxiv icon

MINOTAUR: Multi-task Video Grounding From Multimodal Queries

Feb 16, 2023
Raghav Goyal, Effrosyni Mavroudi, Xitong Yang, Sainbayar Sukhbaatar, Leonid Sigal, Matt Feiszli, Lorenzo Torresani, Du Tran

Figure 1 for MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Figure 2 for MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Figure 3 for MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Figure 4 for MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Viaarxiv icon

Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping

Jan 05, 2023
Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Alessandro Lazaric, Karteek Alahari

Figure 1 for Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping
Figure 2 for Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping
Figure 3 for Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping
Figure 4 for Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping
Viaarxiv icon