Alert button
Picture for Georgios Smyrnis

Georgios Smyrnis

Alert button

DataComp: In search of the next generation of multimodal datasets

May 03, 2023
Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt

Figure 1 for DataComp: In search of the next generation of multimodal datasets
Figure 2 for DataComp: In search of the next generation of multimodal datasets
Figure 3 for DataComp: In search of the next generation of multimodal datasets
Figure 4 for DataComp: In search of the next generation of multimodal datasets

Large multimodal datasets have been instrumental in recent breakthroughs such as CLIP, Stable Diffusion, and GPT-4. At the same time, datasets rarely receive the same research attention as model architectures or training algorithms. To address this shortcoming in the machine learning ecosystem, we introduce DataComp, a benchmark where the training code is fixed and researchers innovate by proposing new training sets. We provide a testbed for dataset experiments centered around a new candidate pool of 12.8B image-text pairs from Common Crawl. Participants in our benchmark design new filtering techniques or curate new data sources and then evaluate their new dataset by running our standardized CLIP training code and testing on 38 downstream test sets. Our benchmark consists of multiple scales, with four candidate pool sizes and associated compute budgets ranging from 12.8M to 12.8B samples seen during training. This multi-scale design facilitates the study of scaling trends and makes the benchmark accessible to researchers with varying resources. Our baseline experiments show that the DataComp workflow is a promising way of improving multimodal datasets. We introduce DataComp-1B, a dataset created by applying a simple filtering algorithm to the 12.8B candidate pool. The resulting 1.4B subset enables training a CLIP ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet. Our new ViT-L/14 model outperforms a larger ViT-g/14 trained on LAION-2B by 0.7 percentage points while requiring 9x less training compute. We also outperform OpenAI's CLIP ViT-L/14 by 3.7 percentage points, which is trained with the same compute budget as our model. These gains highlight the potential for improving model performance by carefully curating training sets. We view DataComp-1B as only the first step and hope that DataComp paves the way toward the next generation of multimodal datasets.

Viaarxiv icon

Inverse Problems Leveraging Pre-trained Contrastive Representations

Oct 26, 2021
Sriram Ravula, Georgios Smyrnis, Matt Jordan, Alexandros G. Dimakis

Figure 1 for Inverse Problems Leveraging Pre-trained Contrastive Representations
Figure 2 for Inverse Problems Leveraging Pre-trained Contrastive Representations
Figure 3 for Inverse Problems Leveraging Pre-trained Contrastive Representations
Figure 4 for Inverse Problems Leveraging Pre-trained Contrastive Representations

We study a new family of inverse problems for recovering representations of corrupted data. We assume access to a pre-trained representation learning network R(x) that operates on clean images, like CLIP. The problem is to recover the representation of an image R(x), if we are only given a corrupted version A(x), for some known forward operator A. We propose a supervised inversion method that uses a contrastive objective to obtain excellent representations for highly corrupted images. Using a linear probe on our robust representations, we achieve a higher accuracy than end-to-end supervised baselines when classifying images with various types of distortions, including blurring, additive noise, and random pixel masking. We evaluate on a subset of ImageNet and observe that our method is robust to varying levels of distortion. Our method outperforms end-to-end baselines even with a fraction of the labeled data in a wide range of forward operators.

* Initial version. Final version to appear in Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021) 
Viaarxiv icon

Tropical Polynomial Division and Neural Networks

Nov 29, 2019
Georgios Smyrnis, Petros Maragos

Figure 1 for Tropical Polynomial Division and Neural Networks
Figure 2 for Tropical Polynomial Division and Neural Networks
Figure 3 for Tropical Polynomial Division and Neural Networks
Figure 4 for Tropical Polynomial Division and Neural Networks

In this work, we examine the process of Tropical Polynomial Division, a geometric method which seeks to emulate the division of regular polynomials, when applied to those of the max-plus semiring. This is done via the approximation of the Newton Polytope of the dividend polynomial by that of the divisor. This process is afterwards generalized and applied in the context of neural networks with ReLU activations. In particular, we make use of the intuition it provides, in order to minimize a two-layer fully connected network, trained for a binary classification problem. This method is later evaluated on a variety of experiments, demonstrating its capability to approximate a network, with minimal loss in performance.

Viaarxiv icon