Recent advances in Artificial Intelligence (AI), especially in Machine Learning (ML), have introduced various practical applications (e.g., virtual personal assistants and autonomous cars) that enhance the experience of everyday users. However, modern ML technologies like Deep Learning require considerable technical expertise and resources to develop, train and deploy such models, making effective reuse of the ML models a necessity. Such discovery and reuse by practitioners and researchers are being addressed by public ML package repositories, which bundle up pre-trained models into packages for publication. Since such repositories are a recent phenomenon, there is no empirical data on their current state and challenges. Hence, this paper conducts an exploratory study that analyzes the structure and contents of two popular ML package repositories, TFHub and PyTorch Hub, comparing their information elements (features and policies), package organization, package manager functionalities and usage contexts against popular software package repositories (npm, PyPI, and CRAN). Through these studies, we have identified unique SE practices and challenges for sharing ML packages. These findings and implications would be useful for data scientists, researchers and software developers who intend to use these shared ML packages.
Due to the popularity of artificial intelligence (AI), more and more AI software products are developed. Because of the lack of specialized AI knowledge, domain data and computational resource, developers are in great need of transfer learning-based AI product development. Such need is satisfied by model zoos and stores, where pretrained deep learning (DL) assets are shared. By integrating the DL assets, developers can give their product AI ability. But the activity behind this simple sentence can be non-trivial. Like traditional software products, AI products will also go through release engineering (RE) process, which is part of software development and includes steps like integration, testing, system building and deployment. Considering RE for transfer learning based AI product development is very helpful to make the products better. But the differences between AI products and traditional software products make the concerns and required efforts hard to be figured out and estimated. Unfortunately, currently few research focus has been put on RE for AI products. This research tries to fill the gap. First, we do the investigation. We look into the deployment scenarios supported by TensorFlow and PyTorch, the top 2 widely used DL frameworks. We also look into 2 model zoos, TFHub (AIHub TensorFlow module) and PyTorch Hub where DL assets developed by TensorFlow and PyTorch are shared. The family phenomenon and version issue on model zoos are investigated. Second, we figured out the concerns and efforts during the RE for AI products. We propose a best practice for the development of transfer learning-based AI products and a case study is conducted to verify the feasibility of the proposed practice. This research can be helpful for AI product developers in their development activities.