Alert button
Picture for Dara Bahri

Dara Bahri

Alert button

Sharpness-Aware Minimization Leads to Low-Rank Features

May 25, 2023
Maksym Andriushchenko, Dara Bahri, Hossein Mobahi, Nicolas Flammarion

Figure 1 for Sharpness-Aware Minimization Leads to Low-Rank Features
Figure 2 for Sharpness-Aware Minimization Leads to Low-Rank Features
Figure 3 for Sharpness-Aware Minimization Leads to Low-Rank Features
Figure 4 for Sharpness-Aware Minimization Leads to Low-Rank Features
Viaarxiv icon

Is margin all you need? An extensive empirical study of active learning on tabular data

Oct 07, 2022
Dara Bahri, Heinrich Jiang, Tal Schuster, Afshin Rostamizadeh

Figure 1 for Is margin all you need? An extensive empirical study of active learning on tabular data
Figure 2 for Is margin all you need? An extensive empirical study of active learning on tabular data
Figure 3 for Is margin all you need? An extensive empirical study of active learning on tabular data
Figure 4 for Is margin all you need? An extensive empirical study of active learning on tabular data
Viaarxiv icon

Confident Adaptive Language Modeling

Jul 14, 2022
Tal Schuster, Adam Fisch, Jai Gupta, Mostafa Dehghani, Dara Bahri, Vinh Q. Tran, Yi Tay, Donald Metzler

Figure 1 for Confident Adaptive Language Modeling
Figure 2 for Confident Adaptive Language Modeling
Figure 3 for Confident Adaptive Language Modeling
Figure 4 for Confident Adaptive Language Modeling
Viaarxiv icon

Unifying Language Learning Paradigms

May 10, 2022
Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler

Figure 1 for Unifying Language Learning Paradigms
Figure 2 for Unifying Language Learning Paradigms
Figure 3 for Unifying Language Learning Paradigms
Figure 4 for Unifying Language Learning Paradigms
Viaarxiv icon

ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference

Apr 25, 2022
Kai Hui, Honglei Zhuang, Tao Chen, Zhen Qin, Jing Lu, Dara Bahri, Ji Ma, Jai Prakash Gupta, Cicero Nogueira dos Santos, Yi Tay, Don Metzler

Figure 1 for ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference
Figure 2 for ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference
Figure 3 for ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference
Figure 4 for ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference
Viaarxiv icon

Transformer Memory as a Differentiable Search Index

Feb 16, 2022
Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler

Figure 1 for Transformer Memory as a Differentiable Search Index
Figure 2 for Transformer Memory as a Differentiable Search Index
Figure 3 for Transformer Memory as a Differentiable Search Index
Figure 4 for Transformer Memory as a Differentiable Search Index
Viaarxiv icon

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

Nov 22, 2021
Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder, Donald Metzler

Figure 1 for ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
Figure 2 for ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
Figure 3 for ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
Figure 4 for ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
Viaarxiv icon

Sharpness-Aware Minimization Improves Language Model Generalization

Oct 16, 2021
Dara Bahri, Hossein Mobahi, Yi Tay

Figure 1 for Sharpness-Aware Minimization Improves Language Model Generalization
Figure 2 for Sharpness-Aware Minimization Improves Language Model Generalization
Figure 3 for Sharpness-Aware Minimization Improves Language Model Generalization
Figure 4 for Sharpness-Aware Minimization Improves Language Model Generalization
Viaarxiv icon

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

Jul 02, 2021
Yi Tay, Vinh Q. Tran, Sebastian Ruder, Jai Gupta, Hyung Won Chung, Dara Bahri, Zhen Qin, Simon Baumgartner, Cong Yu, Donald Metzler

Figure 1 for Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Figure 2 for Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Figure 3 for Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Figure 4 for Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Viaarxiv icon

SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption

Jun 29, 2021
Dara Bahri, Heinrich Jiang, Yi Tay, Donald Metzler

Figure 1 for SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption
Figure 2 for SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption
Figure 3 for SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption
Figure 4 for SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption
Viaarxiv icon