Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sandeep Mukherjee

DataS^3: Dataset Subset Selection for Specialization

Apr 22, 2025

Neha Hulkund, Alaa Maalouf, Levi Cai, Daniel Yang, Tsun-Hsuan Wang, Abigail O'Neil, Timm Haucke, Sandeep Mukherjee, Vikram Ramaswamy, Judy Hansen Shen(+8 more)

Abstract:In many real-world machine learning (ML) applications (e.g. detecting broken bones in x-ray images, detecting species in camera traps), in practice models need to perform well on specific deployments (e.g. a specific hospital, a specific national park) rather than the domain broadly. However, deployments often have imbalanced, unique data distributions. Discrepancy between the training distribution and the deployment distribution can lead to suboptimal performance, highlighting the need to select deployment-specialized subsets from the available training data. We formalize dataset subset selection for specialization (DS3): given a training set drawn from a general distribution and a (potentially unlabeled) query set drawn from the desired deployment-specific distribution, the goal is to select a subset of the training data that optimizes deployment performance. We introduce DataS^3; the first dataset and benchmark designed specifically for the DS3 problem. DataS^3 encompasses diverse real-world application domains, each with a set of distinct deployments to specialize in. We conduct a comprehensive study evaluating algorithms from various families--including coresets, data filtering, and data curation--on DataS^3, and find that general-distribution methods consistently fail on deployment-specific tasks. Additionally, we demonstrate the existence of manually curated (deployment-specific) expert subsets that outperform training on all available data with accuracy gains up to 51.3 percent. Our benchmark highlights the critical role of tailored dataset curation in enhancing performance and training efficiency on deployment-specific distributions, which we posit will only become more important as global, public datasets become available across domains and ML models are deployed in the real world.

Via

Access Paper or Ask Questions

Can Machines Garden? Systematically Comparing the AlphaGarden vs. Professional Horticulturalists

Jun 29, 2023

Simeon Adebola, Rishi Parikh, Mark Presten, Satvik Sharma, Shrey Aeron, Ananth Rao, Sandeep Mukherjee, Tomson Qu, Christina Wistrom, Eugen Solowjow(+1 more)

Figure 1 for Can Machines Garden? Systematically Comparing the AlphaGarden vs. Professional Horticulturalists

Figure 2 for Can Machines Garden? Systematically Comparing the AlphaGarden vs. Professional Horticulturalists

Figure 3 for Can Machines Garden? Systematically Comparing the AlphaGarden vs. Professional Horticulturalists

Figure 4 for Can Machines Garden? Systematically Comparing the AlphaGarden vs. Professional Horticulturalists

Abstract:The AlphaGarden is an automated testbed for indoor polyculture farming which combines a first-order plant simulator, a gantry robot, a seed planting algorithm, plant phenotyping and tracking algorithms, irrigation sensors and algorithms, and custom pruning tools and algorithms. In this paper, we systematically compare the performance of the AlphaGarden to professional horticulturalists on the staff of the UC Berkeley Oxford Tract Greenhouse. The humans and the machine tend side-by-side polyculture gardens with the same seed arrangement. We compare performance in terms of canopy coverage, plant diversity, and water consumption. Results from two 60-day cycles suggest that the automated AlphaGarden performs comparably to professional horticulturalists in terms of coverage and diversity, and reduces water consumption by as much as 44%. Code, videos, and datasets are available at https://sites.google.com/berkeley.edu/systematiccomparison.

* International Conference on Robotics and Automation(ICRA) 2023 Oral

Via

Access Paper or Ask Questions

Automated Pruning of Polyculture Plants

Aug 22, 2022

Mark Presten, Rishi Parikh, Shrey Aeron, Sandeep Mukherjee, Simeon Adebola, Satvik Sharma, Mark Theis, Walter Teitelbaum, Ken Goldberg

Figure 1 for Automated Pruning of Polyculture Plants

Figure 2 for Automated Pruning of Polyculture Plants

Figure 3 for Automated Pruning of Polyculture Plants

Figure 4 for Automated Pruning of Polyculture Plants

Abstract:Polyculture farming has environmental advantages but requires substantially more pruning than monoculture farming. We present novel hardware and algorithms for automated pruning. Using an overhead camera to collect data from a physical scale garden testbed, the autonomous system utilizes a learned Plant Phenotyping convolutional neural network and a Bounding Disk Tracking algorithm to evaluate the individual plant distribution and estimate the state of the garden each day. From this garden state, AlphaGardenSim selects plants to autonomously prune. A trained neural network detects and targets specific prune points on the plant. Two custom-designed pruning tools, compatible with a FarmBot gantry system, are experimentally evaluated and execute autonomous cuts through controlled algorithms. We present results for four 60-day garden cycles. Results suggest the system can autonomously achieve 0.94 normalized plant diversity with pruning shears while maintaining an average canopy coverage of 0.84 by the end of the cycles. For code, videos, and datasets, see https://sites.google.com/berkeley.edu/pruningpolyculture.

* CASE 2022, 8 pages. arXiv admin note: substantial text overlap with arXiv:2111.06014

Via

Access Paper or Ask Questions

AlphaGarden: Learning to Autonomously Tend a Polyculture Garden

Nov 11, 2021

Mark Presten, Yahav Avigal, Mark Theis, Satvik Sharma, Rishi Parikh, Shrey Aeron, Sandeep Mukherjee, Sebastian Oehme, Simeon Adebola, Walter Teitelbaum(+2 more)

Figure 1 for AlphaGarden: Learning to Autonomously Tend a Polyculture Garden

Figure 2 for AlphaGarden: Learning to Autonomously Tend a Polyculture Garden

Figure 3 for AlphaGarden: Learning to Autonomously Tend a Polyculture Garden

Figure 4 for AlphaGarden: Learning to Autonomously Tend a Polyculture Garden

Abstract:This paper presents AlphaGarden: an autonomous polyculture garden that prunes and irrigates living plants in a 1.5m x 3.0m physical testbed. AlphaGarden uses an overhead camera and sensors to track the plant distribution and soil moisture. We model individual plant growth and interplant dynamics to train a policy that chooses actions to maximize leaf coverage and diversity. For autonomous pruning, AlphaGarden uses two custom-designed pruning tools and a trained neural network to detect prune points. We present results for four 60-day garden cycles. Results suggest AlphaGarden can autonomously achieve 0.96 normalized diversity with pruning shears while maintaining an average canopy coverage of 0.86 during the peak of the cycle. Code, datasets, and supplemental material can be found at https://github.com/BerkeleyAutomation/AlphaGarden.

* 7 pages, 7 figures, 2 tables

Via

Access Paper or Ask Questions