Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Accessibility Barriers in Multi-Terabyte Public Datasets: The Gap Between Promise and Practice

Jun 16, 2025

Marc Bara

Share this with someone who'll enjoy it:

Abstract:The promise of "free and open" multi-terabyte datasets often collides with harsh realities. While these datasets may be technically accessible, practical barriers -- from processing complexity to hidden costs -- create a system that primarily serves well-funded institutions. This study examines accessibility challenges across web crawls, satellite imagery, scientific data, and collaborative projects, revealing a consistent two-tier system where theoretical openness masks practical exclusivity. Our analysis demonstrates that datasets marketed as "publicly accessible" typically require minimum investments of \$1,000+ for meaningful analysis, with complex processing pipelines demanding \$10,000-100,000+ in infrastructure costs. The infrastructure requirements -- distributed computing knowledge, domain expertise, and substantial budgets -- effectively gatekeep these datasets despite their "open" status, limiting practical accessibility to those with institutional support or substantial resources.

* 5 pages, 28 references. Analysis of practical barriers to accessing multi-terabyte public datasets

View paper on

Share this with someone who'll enjoy it:

Title:Accessibility Barriers in Multi-Terabyte Public Datasets: The Gap Between Promise and Practice

Paper and Code