Alert button
Picture for Mitchell Wortsman

Mitchell Wortsman

Alert button

Language models scale reliably with over-training and on downstream tasks

Add code
Bookmark button
Alert button
Mar 13, 2024
Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar, Suchin Gururangan, Mitchell Wortsman, Rulin Shao, Jean Mercat, Alex Fang, Jeffrey Li, Sedrick Keh, Rui Xin, Marianna Nezhurina, Igor Vasiljevic, Jenia Jitsev, Alexandros G. Dimakis, Gabriel Ilharco, Shuran Song, Thomas Kollar, Yair Carmon, Achal Dave, Reinhard Heckel, Niklas Muennighoff, Ludwig Schmidt

Figure 1 for Language models scale reliably with over-training and on downstream tasks
Figure 2 for Language models scale reliably with over-training and on downstream tasks
Figure 3 for Language models scale reliably with over-training and on downstream tasks
Figure 4 for Language models scale reliably with over-training and on downstream tasks
Viaarxiv icon

OLMo: Accelerating the Science of Language Models

Add code
Bookmark button
Alert button
Feb 07, 2024
Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi

Viaarxiv icon

Small-scale proxies for large-scale Transformer training instabilities

Add code
Bookmark button
Alert button
Sep 25, 2023
Mitchell Wortsman, Peter J. Liu, Lechao Xiao, Katie Everett, Alex Alemi, Ben Adlam, John D. Co-Reyes, Izzeddin Gur, Abhishek Kumar, Roman Novak, Jeffrey Pennington, Jascha Sohl-dickstein, Kelvin Xu, Jaehoon Lee, Justin Gilmer, Simon Kornblith

Figure 1 for Small-scale proxies for large-scale Transformer training instabilities
Figure 2 for Small-scale proxies for large-scale Transformer training instabilities
Figure 3 for Small-scale proxies for large-scale Transformer training instabilities
Figure 4 for Small-scale proxies for large-scale Transformer training instabilities
Viaarxiv icon

Replacing softmax with ReLU in Vision Transformers

Add code
Bookmark button
Alert button
Sep 15, 2023
Mitchell Wortsman, Jaehoon Lee, Justin Gilmer, Simon Kornblith

Figure 1 for Replacing softmax with ReLU in Vision Transformers
Figure 2 for Replacing softmax with ReLU in Vision Transformers
Figure 3 for Replacing softmax with ReLU in Vision Transformers
Figure 4 for Replacing softmax with ReLU in Vision Transformers
Viaarxiv icon

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

Add code
Bookmark button
Alert button
Aug 07, 2023
Anas Awadalla, Irena Gao, Josh Gardner, Jack Hessel, Yusuf Hanafy, Wanrong Zhu, Kalyani Marathe, Yonatan Bitton, Samir Gadre, Shiori Sagawa, Jenia Jitsev, Simon Kornblith, Pang Wei Koh, Gabriel Ilharco, Mitchell Wortsman, Ludwig Schmidt

Figure 1 for OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Figure 2 for OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Figure 3 for OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Figure 4 for OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Viaarxiv icon

DataComp: In search of the next generation of multimodal datasets

Add code
Bookmark button
Alert button
May 03, 2023
Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt

Figure 1 for DataComp: In search of the next generation of multimodal datasets
Figure 2 for DataComp: In search of the next generation of multimodal datasets
Figure 3 for DataComp: In search of the next generation of multimodal datasets
Figure 4 for DataComp: In search of the next generation of multimodal datasets
Viaarxiv icon

Stable and low-precision training for large-scale vision-language models

Add code
Bookmark button
Alert button
Apr 25, 2023
Mitchell Wortsman, Tim Dettmers, Luke Zettlemoyer, Ari Morcos, Ali Farhadi, Ludwig Schmidt

Figure 1 for Stable and low-precision training for large-scale vision-language models
Figure 2 for Stable and low-precision training for large-scale vision-language models
Figure 3 for Stable and low-precision training for large-scale vision-language models
Figure 4 for Stable and low-precision training for large-scale vision-language models
Viaarxiv icon