Alert button
Picture for Sainbayar Sukhbaatar

Sainbayar Sukhbaatar

Alert button

Reverse Training to Nurse the Reversal Curse

Add code
Bookmark button
Alert button
Mar 20, 2024
Olga Golovneva, Zeyuan Allen-Zhu, Jason Weston, Sainbayar Sukhbaatar

Figure 1 for Reverse Training to Nurse the Reversal Curse
Figure 2 for Reverse Training to Nurse the Reversal Curse
Figure 3 for Reverse Training to Nurse the Reversal Curse
Figure 4 for Reverse Training to Nurse the Reversal Curse
Viaarxiv icon

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Add code
Bookmark button
Alert button
Mar 12, 2024
Sainbayar Sukhbaatar, Olga Golovneva, Vasu Sharma, Hu Xu, Xi Victoria Lin, Baptiste Rozière, Jacob Kahn, Daniel Li, Wen-tau Yih, Jason Weston, Xian Li

Figure 1 for Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Figure 2 for Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Figure 3 for Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Figure 4 for Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Viaarxiv icon

Teaching Large Language Models to Reason with Reinforcement Learning

Add code
Bookmark button
Alert button
Mar 07, 2024
Alex Havrilla, Yuqing Du, Sharath Chandra Raparthy, Christoforos Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Sainbayar Sukhbaatar, Roberta Raileanu

Figure 1 for Teaching Large Language Models to Reason with Reinforcement Learning
Figure 2 for Teaching Large Language Models to Reason with Reinforcement Learning
Figure 3 for Teaching Large Language Models to Reason with Reinforcement Learning
Figure 4 for Teaching Large Language Models to Reason with Reinforcement Learning
Viaarxiv icon

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Add code
Bookmark button
Alert button
Feb 21, 2024
Lucas Lehnert, Sainbayar Sukhbaatar, Paul Mcvay, Michael Rabbat, Yuandong Tian

Viaarxiv icon

Self-Rewarding Language Models

Add code
Bookmark button
Alert button
Jan 18, 2024
Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Sainbayar Sukhbaatar, Jing Xu, Jason Weston

Viaarxiv icon

Some things are more CRINGE than others: Preference Optimization with the Pairwise Cringe Loss

Add code
Bookmark button
Alert button
Dec 27, 2023
Jing Xu, Andrew Lee, Sainbayar Sukhbaatar, Jason Weston

Viaarxiv icon

System 2 Attention (is something you might need too)

Add code
Bookmark button
Alert button
Nov 20, 2023
Jason Weston, Sainbayar Sukhbaatar

Viaarxiv icon

A Data Source for Reasoning Embodied Agents

Add code
Bookmark button
Alert button
Sep 14, 2023
Jack Lanchantin, Sainbayar Sukhbaatar, Gabriel Synnaeve, Yuxuan Sun, Kavya Srinet, Arthur Szlam

Figure 1 for A Data Source for Reasoning Embodied Agents
Figure 2 for A Data Source for Reasoning Embodied Agents
Figure 3 for A Data Source for Reasoning Embodied Agents
Figure 4 for A Data Source for Reasoning Embodied Agents
Viaarxiv icon

Improving Open Language Models by Learning from Organic Interactions

Add code
Bookmark button
Alert button
Jun 07, 2023
Jing Xu, Da Ju, Joshua Lane, Mojtaba Komeili, Eric Michael Smith, Megan Ung, Morteza Behrooz, William Ngan, Rashel Moritz, Sainbayar Sukhbaatar, Y-Lan Boureau, Jason Weston, Kurt Shuster

Figure 1 for Improving Open Language Models by Learning from Organic Interactions
Figure 2 for Improving Open Language Models by Learning from Organic Interactions
Figure 3 for Improving Open Language Models by Learning from Organic Interactions
Figure 4 for Improving Open Language Models by Learning from Organic Interactions
Viaarxiv icon

Large Language Model Programs

Add code
Bookmark button
Alert button
May 09, 2023
Imanol Schlag, Sainbayar Sukhbaatar, Asli Celikyilmaz, Wen-tau Yih, Jason Weston, Jürgen Schmidhuber, Xian Li

Figure 1 for Large Language Model Programs
Figure 2 for Large Language Model Programs
Figure 3 for Large Language Model Programs
Figure 4 for Large Language Model Programs
Viaarxiv icon