Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kar Balan

A Framework for Cryptographic Verifiability of End-to-End AI Pipelines

Mar 28, 2025

Kar Balan, Robert Learney, Tim Wood

Abstract:The increasing integration of Artificial Intelligence across multiple industry sectors necessitates robust mechanisms for ensuring transparency, trust, and auditability of its development and deployment. This topic is particularly important in light of recent calls in various jurisdictions to introduce regulation and legislation on AI safety. In this paper, we propose a framework for complete verifiable AI pipelines, identifying key components and analyzing existing cryptographic approaches that contribute to verifiability across different stages of the AI lifecycle, from data sourcing to training, inference, and unlearning. This framework could be used to combat misinformation by providing cryptographic proofs alongside AI-generated assets to allow downstream verification of their provenance and correctness. Our findings underscore the importance of ongoing research to develop cryptographic tools that are not only efficient for isolated AI processes, but that are efficiently `linkable' across different processes within the AI pipeline, to support the development of end-to-end verifiable AI technologies.

* Accepted to 11th ACM International Workshop on Security and Privacy Analytics (IWSPA 2025)

Via

Access Paper or Ask Questions

PDFed: Privacy-Preserving and Decentralized Asynchronous Federated Learning for Diffusion Models

Sep 26, 2024

Kar Balan, Andrew Gilbert, John Collomosse

Figure 1 for PDFed: Privacy-Preserving and Decentralized Asynchronous Federated Learning for Diffusion Models

Figure 2 for PDFed: Privacy-Preserving and Decentralized Asynchronous Federated Learning for Diffusion Models

Figure 3 for PDFed: Privacy-Preserving and Decentralized Asynchronous Federated Learning for Diffusion Models

Figure 4 for PDFed: Privacy-Preserving and Decentralized Asynchronous Federated Learning for Diffusion Models

Abstract:We present PDFed, a decentralized, aggregator-free, and asynchronous federated learning protocol for training image diffusion models using a public blockchain. In general, diffusion models are prone to memorization of training data, raising privacy and ethical concerns (e.g., regurgitation of private training data in generated images). Federated learning (FL) offers a partial solution via collaborative model training across distributed nodes that safeguard local data privacy. PDFed proposes a novel sample-based score that measures the novelty and quality of generated samples, incorporating these into a blockchain-based federated learning protocol that we show reduces private data memorization in the collaboratively trained model. In addition, PDFed enables asynchronous collaboration among participants with varying hardware capabilities, facilitating broader participation. The protocol records the provenance of AI models, improving transparency and auditability, while also considering automated incentive and reward mechanisms for participants. PDFed aims to empower artists and creators by protecting the privacy of creative works and enabling decentralized, peer-to-peer collaboration. The protocol positively impacts the creative economy by opening up novel revenue streams and fostering innovative ways for artists to benefit from their contributions to the AI space.

* Accepted to CM SIGGRAPH European Conference on Visual Media Production 2024

Via

Access Paper or Ask Questions

DECORAIT -- DECentralized Opt-in/out Registry for AI Training

Sep 25, 2023

Kar Balan, Alex Black, Simon Jenni, Andrew Gilbert, Andy Parsons, John Collomosse

Abstract:We present DECORAIT; a decentralized registry through which content creators may assert their right to opt in or out of AI training as well as receive reward for their contributions. Generative AI (GenAI) enables images to be synthesized using AI models trained on vast amounts of data scraped from public sources. Model and content creators who may wish to share their work openly without sanctioning its use for training are thus presented with a data governance challenge. Further, establishing the provenance of GenAI training data is important to creatives to ensure fair recognition and reward for their such use. We report a prototype of DECORAIT, which explores hierarchical clustering and a combination of on/off-chain storage to create a scalable decentralized registry to trace the provenance of GenAI training data in order to determine training consent and reward creatives who contribute that data. DECORAIT combines distributed ledger technology (DLT) with visual fingerprinting, leveraging the emerging C2PA (Coalition for Content Provenance and Authenticity) standard to create a secure, open registry through which creatives may express consent and data ownership for GenAI.

* Proc. of the 20th ACM SIGGRAPH European Conference on Visual Media Production

Via

Access Paper or Ask Questions

EKILA: Synthetic Media Provenance and Attribution for Generative Art

Apr 10, 2023

Kar Balan, Shruti Agarwal, Simon Jenni, Andy Parsons, Andrew Gilbert, John Collomosse

Figure 1 for EKILA: Synthetic Media Provenance and Attribution for Generative Art

Figure 2 for EKILA: Synthetic Media Provenance and Attribution for Generative Art

Figure 3 for EKILA: Synthetic Media Provenance and Attribution for Generative Art

Figure 4 for EKILA: Synthetic Media Provenance and Attribution for Generative Art

Abstract:We present EKILA; a decentralized framework that enables creatives to receive recognition and reward for their contributions to generative AI (GenAI). EKILA proposes a robust visual attribution technique and combines this with an emerging content provenance standard (C2PA) to address the problem of synthetic image provenance -- determining the generative model and training data responsible for an AI-generated image. Furthermore, EKILA extends the non-fungible token (NFT) ecosystem to introduce a tokenized representation for rights, enabling a triangular relationship between the asset's Ownership, Rights, and Attribution (ORA). Leveraging the ORA relationship enables creators to express agency over training consent and, through our attribution model, to receive apportioned credit, including royalty payments for the use of their assets in GenAI.

* Proc. CVPR Workshop on Media Forensics 2023

Via

Access Paper or Ask Questions