As Artificial Intelligence (AI) integrates into diverse areas, particularly in content generation, ensuring rightful ownership and ethical use becomes paramount. AI service providers are expected to prioritize responsibly sourcing training data and obtaining licenses from data owners. However, existing studies primarily center on safeguarding static copyrights, which simply treats metadata/datasets as non-fungible items with transferable/trading capabilities, neglecting the dynamic nature of training procedures that can shape an ongoing trajectory. In this paper, we present \textsc{IBis}, a blockchain-based framework tailored for AI model training workflows. \textsc{IBis} integrates on-chain registries for datasets, licenses and models, alongside off-chain signing services to facilitate collaboration among multiple participants. Our framework addresses concerns regarding data and model provenance and copyright compliance. \textsc{IBis} enables iterative model retraining and fine-tuning, and offers flexible license checks and renewals. Further, \textsc{IBis} provides APIs designed for seamless integration with existing contract management software, minimizing disruptions to established model training processes. We implement \textsc{IBis} using Daml on the Canton blockchain. Evaluation results showcase the feasibility and scalability of \textsc{IBis} across varying numbers of users, datasets, models, and licenses.
Blockchained Federated Learning (FL) has been gaining traction for ensuring the integrity and traceability of FL processes. Blockchained FL involves participants training models locally with their data and subsequently publishing the models on the blockchain, forming a Directed Acyclic Graph (DAG)-like inheritance structure that represents the model relationship. However, this particular DAG-based structure presents challenges in updating models with sensitive data, due to the complexity and overhead involved. To address this, we propose Blockchained Federated Unlearning (BlockFUL), a generic framework that redesigns the blockchain structure using Chameleon Hash (CH) technology to mitigate the complexity of model updating, thereby reducing the computational and consensus costs of unlearning tasks.Furthermore, BlockFUL supports various federated unlearning methods, ensuring the integrity and traceability of model updates, whether conducted in parallel or serial. We conduct a comprehensive study of two typical unlearning methods, gradient ascent and re-training, demonstrating the efficient unlearning workflow in these two categories with minimal CH and block update operations. Additionally, we compare the computation and communication costs of these methods.
As a distributed learning, Federated Learning (FL) faces two challenges: the unbalanced distribution of training data among participants, and the model attack by Byzantine nodes. In this paper, we consider the long-tailed distribution with the presence of Byzantine nodes in the FL scenario. A novel two-layer aggregation method is proposed for the rejection of malicious models and the advisable selection of valuable models containing tail class data information. We introduce the concept of think tank to leverage the wisdom of all participants. Preliminary experiments validate that the think tank can make effective model selections for global aggregation.
The demand for intelligent industries and smart services based on big data is rising rapidly with the increasing digitization and intelligence of the modern world. This survey comprehensively reviews Blockchained Federated Learning (BlockFL) that joins the benefits of both Blockchain and Federated Learning to provide a secure and efficient solution for the demand. We compare the existing BlockFL models in four Internet-of-Things (IoT) application scenarios: Personal IoT (PIoT), Industrial IoT (IIoT), Internet of Vehicles (IoV), and Internet of Health Things (IoHT), with a focus on security and privacy, trust and reliability, efficiency, and data heterogeneity. Our analysis shows that the features of decentralization and transparency make BlockFL a secure and effective solution for distributed model training, while the overhead and compatibility still need further study. It also reveals the unique challenges of each domain presents unique challenges, e.g., the requirement of accommodating dynamic environments in IoV and the high demands of identity and permission management in IoHT, in addition to some common challenges identified, such as privacy, resource constraints, and data heterogeneity. Furthermore, we examine the existing technologies that can benefit BlockFL, thereby helping researchers and practitioners to make informed decisions about the selection and development of BlockFL for various IoT application scenarios.
Federated learning (FL) provides an effective machine learning (ML) architecture to protect data privacy in a distributed manner. However, the inevitable network asynchrony, the over-dependence on a central coordinator, and the lack of an open and fair incentive mechanism collectively hinder its further development. We propose \textsc{IronForge}, a new generation of FL framework, that features a Directed Acyclic Graph (DAG)-based data structure and eliminates the need for central coordinators to achieve fully decentralized operations. \textsc{IronForge} runs in a public and open network, and launches a fair incentive mechanism by enabling state consistency in the DAG, so that the system fits in networks where training resources are unevenly distributed. In addition, dedicated defense strategies against prevalent FL attacks on incentive fairness and data privacy are presented to ensure the security of \textsc{IronForge}. Experimental results based on a newly developed testbed FLSim highlight the superiority of \textsc{IronForge} to the existing prevalent FL frameworks under various specifications in performance, fairness, and security. To the best of our knowledge, \textsc{IronForge} is the first secure and fully decentralized FL framework that can be applied in open networks with realistic network and training settings.