Machine Learning (ML) research publications commonly provide open-source implementations on GitHub, allowing their audience to replicate, validate, or even extend machine learning algorithms, data sets, and metadata. However, thus far little is known about the degree of collaboration activity happening on such ML research repositories, in particular regarding (1) the degree to which such repositories receive contributions from forks, (2) the nature of such contributions (i.e., the types of changes), and (3) the nature of changes that are not contributed back to forks, which might represent missed opportunities. In this paper, we empirically study contributions to 1,346 ML research repositories and their 67,369 forks, both quantitatively and qualitatively (by building on Hindle et al.'s seminal taxonomy of code changes). We found that while ML research repositories are heavily forked, only 9% of the forks made modifications to the forked repository. 42% of the latter sent changes to the parent repositories, half of which (52%) were accepted by the parent repositories. Our qualitative analysis on 539 contributed and 378 local (fork-only) changes, extends Hindle et al.'s taxonomy with one new top-level change category related to ML (Data), and 15 new sub-categories, including nine ML-specific ones (input data, output data, program data, sharing, change evaluation, parameter tuning, performance, pre-processing, model training). While the changes that are not contributed back by the forks mostly concern domain-specific customizations and local experimentation (e.g., parameter tuning), the origin ML repositories do miss out on a non-negligible 15.4% of Documentation changes, 13.6% of Feature changes and 11.4% of Bug fix changes. The findings in this paper will be useful for practitioners, researchers, toolsmiths, and educators.
Recent advances in Artificial Intelligence (AI), especially in Machine Learning (ML), have introduced various practical applications (e.g., virtual personal assistants and autonomous cars) that enhance the experience of everyday users. However, modern ML technologies like Deep Learning require considerable technical expertise and resources to develop, train and deploy such models, making effective reuse of the ML models a necessity. Such discovery and reuse by practitioners and researchers are being addressed by public ML package repositories, which bundle up pre-trained models into packages for publication. Since such repositories are a recent phenomenon, there is no empirical data on their current state and challenges. Hence, this paper conducts an exploratory study that analyzes the structure and contents of two popular ML package repositories, TFHub and PyTorch Hub, comparing their information elements (features and policies), package organization, package manager functionalities and usage contexts against popular software package repositories (npm, PyPI, and CRAN). Through these studies, we have identified unique SE practices and challenges for sharing ML packages. These findings and implications would be useful for data scientists, researchers and software developers who intend to use these shared ML packages.
The coordination of robot swarms - large decentralized teams of robots - generally relies on robust and efficient inter-robot communication. Maintaining communication between robots is particularly challenging in field deployments. Unstructured environments, limited computational resources, low bandwidth, and robot failures all contribute to the complexity of connectivity maintenance. In this paper, we propose a novel lightweight algorithm to navigate a group of robots in complex environments while maintaining connectivity by building a chain of robots. The algorithm is robust to single robot failures and can heal broken communication links. The algorithm works in 3D environments: when a region is unreachable by wheeled robots, the chain is extended with flying robots. We test the performance of the algorithm using up to 100 robots in a physics-based simulator with three mazes and different robot failure scenarios. We then validate the algorithm with physical platforms: 7 wheeled robots and 6 flying ones, in homogeneous and heterogeneous scenarios.
Connectivity maintenance plays a key role in achieving a desired global behavior among a swarm of robots. However, connectivity maintenance in realistic environments is hampered by lack of computation resources, low communication bandwidth, robot failures, and unstable links. In this paper, we propose a novel decentralized connectivity-preserving algorithm that can be deployed on top of other behaviors to enforce connectivity constraints. The algorithm takes a set of targets to be reached while keeping a minimum number of redundant links between robots, with the goal of guaranteeing bandwidth and reliability. Robots then incrementally build and maintain a communication backbone with the specified number of links. We empirically study the performance of the algorithm, analyzing its time to convergence, as well as robustness to faults injected into the backbone robots. Our results statistically demonstrate the algorithm's ability to preserve the desired connectivity constraints and to reach the targets with up to 70 percent of individual robot failures in the communication backbone.