Alert button
Picture for Mayank Mishra

Mayank Mishra

Alert button

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

Add code
Bookmark button
Alert button
Apr 08, 2024
Bowen Pan, Yikang Shen, Haokun Liu, Mayank Mishra, Gaoyuan Zhang, Aude Oliva, Colin Raffel, Rameswar Panda

Viaarxiv icon

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

Add code
Bookmark button
Alert button
Apr 04, 2024
Aniruddha Nrusimha, Mayank Mishra, Naigang Wang, Dan Alistarh, Rameswar Panda, Yoon Kim

Viaarxiv icon

DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets

Add code
Bookmark button
Alert button
Apr 03, 2024
Harsh Rangwani, Pradipto Mondal, Mayank Mishra, Ashish Ramayee Asokan, R. Venkatesh Babu

Viaarxiv icon

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Add code
Bookmark button
Alert button
Mar 30, 2024
Taishi Nakamura, Mayank Mishra, Simone Tedeschi, Yekun Chai, Jason T Stillerman, Felix Friedrich, Prateek Yadav, Tanmay Laud, Vu Minh Chien, Terry Yue Zhuo, Diganta Misra, Ben Bogin, Xuan-Son Vu, Marzena Karpinska, Arnav Varma Dantuluri, Wojciech Kusa, Tommaso Furlanello, Rio Yokota, Niklas Muennighoff, Suhas Pai, Tosin Adewumi, Veronika Laippala, Xiaozhe Yao, Adalberto Junior, Alpay Ariyak, Aleksandr Drozd, Jordan Clive, Kshitij Gupta, Liangyu Chen, Qi Sun, Ken Tsui, Noah Persaud, Nour Fahmy, Tianlong Chen, Mohit Bansal, Nicolo Monti, Tai Dang, Ziyang Luo, Tien-Tung Bui, Roberto Navigli, Virendra Mehta, Matthew Blumberg, Victor May, Huu Nguyen, Sampo Pyysalo

Viaarxiv icon

StarCoder 2 and The Stack v2: The Next Generation

Add code
Bookmark button
Alert button
Feb 29, 2024
Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krauß, Naman Jain, Yixuan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian McAuley, Han Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Carolyn Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh, Yacine Jernite, Carlos Muñoz Ferrandis, Lingming Zhang, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries

Viaarxiv icon

BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback

Add code
Bookmark button
Alert button
Feb 04, 2024
Gaurav Pandey, Yatin Nandwani, Tahira Naseem, Mayank Mishra, Guangxuan Xu, Dinesh Raghu, Sachindra Joshi, Asim Munawar, Ramón Fernandez Astudillo

Viaarxiv icon

Prompting with Pseudo-Code Instructions

Add code
Bookmark button
Alert button
May 22, 2023
Mayank Mishra, Prince Kumar, Riyaz Bhat, Rudra Murthy V, Danish Contractor, Srikanth Tamilselvam

Figure 1 for Prompting with Pseudo-Code Instructions
Figure 2 for Prompting with Pseudo-Code Instructions
Figure 3 for Prompting with Pseudo-Code Instructions
Figure 4 for Prompting with Pseudo-Code Instructions
Viaarxiv icon

StarCoder: may the source be with you!

Add code
Bookmark button
Alert button
May 09, 2023
Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Benjamin Lipkin, Muhtasham Oblokulov, Zhiruo Wang, Rudra Murthy, Jason Stillerman, Siva Sankalp Patel, Dmitry Abulkhanov, Marco Zocca, Manan Dey, Zhihan Zhang, Nour Fahmy, Urvashi Bhattacharyya, Wenhao Yu, Swayam Singh, Sasha Luccioni, Paulo Villegas, Maxim Kunakov, Fedor Zhdanov, Manuel Romero, Tony Lee, Nadav Timor, Jennifer Ding, Claire Schlesinger, Hailey Schoelkopf, Jan Ebert, Tri Dao, Mayank Mishra, Alex Gu, Jennifer Robinson, Carolyn Jane Anderson, Brendan Dolan-Gavitt, Danish Contractor, Siva Reddy, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Carlos Muñoz Ferrandis, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries

Figure 1 for StarCoder: may the source be with you!
Figure 2 for StarCoder: may the source be with you!
Figure 3 for StarCoder: may the source be with you!
Figure 4 for StarCoder: may the source be with you!
Viaarxiv icon

SantaCoder: don't reach for the stars!

Add code
Bookmark button
Alert button
Jan 09, 2023
Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra

Figure 1 for SantaCoder: don't reach for the stars!
Figure 2 for SantaCoder: don't reach for the stars!
Figure 3 for SantaCoder: don't reach for the stars!
Figure 4 for SantaCoder: don't reach for the stars!
Viaarxiv icon

Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data

Add code
Bookmark button
Alert button
Dec 28, 2022
Harsh Rangwani, Sumukh K Aithal, Mayank Mishra, R. Venkatesh Babu

Figure 1 for Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data
Figure 2 for Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data
Figure 3 for Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data
Figure 4 for Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data
Viaarxiv icon