Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Srivatsa Kundurthy

LAION-5B: An open large-scale dataset for training next generation image-text models

Oct 16, 2022

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman(+6 more)

Figure 1 for LAION-5B: An open large-scale dataset for training next generation image-text models

Figure 2 for LAION-5B: An open large-scale dataset for training next generation image-text models

Figure 3 for LAION-5B: An open large-scale dataset for training next generation image-text models

Figure 4 for LAION-5B: An open large-scale dataset for training next generation image-text models

Abstract:Groundbreaking language-vision architectures like CLIP and DALL-E proved the utility of training on large amounts of noisy image-text data, without relying on expensive accurate labels used in standard vision unimodal supervised learning. The resulting models showed capabilities of strong text-guided image generation and transfer to downstream tasks, while performing remarkably at zero-shot classification with noteworthy out-of-distribution robustness. Since then, large-scale language-vision models like ALIGN, BASIC, GLIDE, Flamingo and Imagen made further improvements. Studying the training and capabilities of such models requires datasets containing billions of image-text pairs. Until now, no datasets of this size have been made openly available for the broader research community. To address this problem and democratize research on large-scale multi-modal models, we present LAION-5B - a dataset consisting of 5.85 billion CLIP-filtered image-text pairs, of which 2.32B contain English language. We show successful replication and fine-tuning of foundational models like CLIP, GLIDE and Stable Diffusion using the dataset, and discuss further experiments enabled with an openly available dataset of this scale. Additionally we provide several nearest neighbor indices, an improved web-interface for dataset exploration and subset generation, and detection scores for watermark, NSFW, and toxic content detection. Announcement page https://laion.ai/laion-5b-a-new-era-of-open-large-scale-multi-modal-datasets/

* 36th Conference on Neural Information Processing Systems (NeurIPS 2022), Track on Datasets and Benchmarks. OpenReview: https://openreview.net/forum?id=M3Y74vmsMcY

Via

Access Paper or Ask Questions

LANTERN-RD: Enabling Deep Learning for Mitigation of the Invasive Spotted Lanternfly

May 12, 2022

Srivatsa Kundurthy

Figure 1 for LANTERN-RD: Enabling Deep Learning for Mitigation of the Invasive Spotted Lanternfly

Figure 2 for LANTERN-RD: Enabling Deep Learning for Mitigation of the Invasive Spotted Lanternfly

Figure 3 for LANTERN-RD: Enabling Deep Learning for Mitigation of the Invasive Spotted Lanternfly

Abstract:The Spotted Lanternfly (SLF) is an invasive planthopper that threatens the local biodiversity and agricultural economy of regions such as the Northeastern United States and Japan. As researchers scramble to study the insect, there is a great potential for computer vision tasks such as detection, pose estimation, and accurate identification to have important downstream implications in containing the SLF. However, there is currently no publicly available dataset for training such AI models. To enable computer vision applications and motivate advancements to challenge the invasive SLF problem, we propose LANTERN-RD, the first curated image dataset of the spotted lanternfly and its look-alikes, featuring images with varied lighting conditions, diverse backgrounds, and subjects in assorted poses. A VGG16-based baseline CNN validates the potential of this dataset for stimulating fresh computer vision applications to accelerate invasive SLF research. Additionally, we implement the trained model in a simple mobile classification application in order to directly empower responsible public mitigation efforts. The overarching mission of this work is to introduce a novel SLF image dataset and release a classification framework that enables computer vision applications, boosting studies surrounding the invasive SLF and assisting in minimizing its agricultural and economic damage.

* Under Review at IEEE Conference on Computer Vision and Pattern Recognition, CV4Animals: Computer Vision for Animal Behavior Tracking and Modeling Workshop, 2022

Via

Access Paper or Ask Questions