Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

Enabling collaborative data science development with the Ballet framework

Dec 14, 2020
Micah J. Smith, Jürgen Cito, Kelvin Lu, Kalyan Veeramachaneni

Share this with someone who'll enjoy it:

While the open-source model for software development has led to successful large-scale collaborations in building software systems, data science projects are frequently developed by individuals or small groups. We describe challenges to scaling data science collaborations and present a novel ML programming model to address them. We instantiate these ideas in Ballet, a lightweight software framework for collaborative open-source data science and a cloud-based development environment, with a plugin for collaborative feature engineering. Using our framework, collaborators incrementally propose feature definitions to a repository which are each subjected to an ML evaluation and can be automatically merged into an executable feature engineering pipeline. We leverage Ballet to conduct an extensive case study analysis of a real-world income prediction problem, and discuss implications for collaborative projects.

   Access Paper Source

Share this with someone who'll enjoy it: