Rapid advancement of machine learning solutions has often coincided with the production of a test public data set. Such datasets reduce the largest barrier to entry for tackling a problem -- procuring data -- while also providing a benchmark to compare different solutions. Furthermore, large datasets have been used to train high-performing feature finders which are then used in new approaches to problems beyond that initially defined. In order to encourage the rapid development in the analysis of data collected using liquid argon time projection chambers, a class of particle detectors used in high energy physics experiments, we have produced the PILArNet, first 2D and 3D open dataset to be used for a couple of key analysis tasks. The initial dataset presented in this paper contains 300,000 samples simulated and recorded in three different volume sizes. The dataset is stored efficiently in sparse 2D and 3D matrix format with auxiliary information about simulated particles in the volume, and is made available for public research use. In this paper we describe the dataset, tasks, and the method used to procure the sample.
Deep convolutional neural networks (CNNs) show strong promise for analyzing scientific data in many domains including particle imaging detectors such as a liquid argon time projection chamber (LArTPC). Yet the high sparsity of LArTPC data challenges traditional CNNs which were designed for dense data such as photographs. A naive application of CNNs on LArTPC data results in inefficient computations and a poor scalability to large LArTPC detectors such as the Short Baseline Neutrino Program and Deep Underground Neutrino Experiment. Recently Submanifold Sparse Convolutional Networks (SSCNs) have been proposed to address this challenge. We report their performance on a 3D semantic segmentation task on simulated LArTPC samples. In comparison with standard CNNs, we observe that the computation memory and wall-time cost for inference are reduced by factor of 364 and 33 respectively without loss of accuracy. The same factors for 2D samples are found to be 93 and 3.1 respectively. Using SSCN, we present the first machine learning-based approach to the reconstruction of Michel electrons using public 3D LArTPC samples. We find a Michel electron identification efficiency of 93.9\% with 98.8\% of true positive rate. Reconstructed Michel electron clusters yield 96.1\% in average pixel clustering efficiency and 97.3\% in purity. The results are compelling to show strong promise of scalable data reconstruction technique using deep neural networks for large scale LArTPC detectors.