PDFed: Privacy-Preserving and Decentralized Asynchronous Federated Learning for Diffusion Models

Unviersity of Surrey[1], Adobe Research[2]
Conference on Visual Media Production 2024

Data memorization presents a privacy challenge for the federated training of diffusion models, where private data can be regurgitated as training progresses. PDFed proposes a novel measure and mitigation of this behavior through an aggregator-free distributed training protocol using distributed ledger technology (DLT), or blockchainโ€™ that we later show reduces data memorization.

Abstract

We present PDFed, a decentralized, aggregator-free, and asynchronous federated learning protocol for training image diffusion models using a public blockchain. In general, diffusion models are prone to memorization of training data, raising privacy and ethical concerns (e.g., regurgitation of private training data in generated images). Federated learning (FL) offers a partial solution via collaborative model training across distributed nodes that safeguard local data privacy. PDFed proposes a novel sample-based score that measures the novelty and quality of generated samples, incorporating these into a blockchain-based federated learning protocol that we show reduces private data memorization in the collaboratively trained model. In addition, PDFed enables asynchronous collaboration among participants with varying hardware capabilities, facilitating broader participation. The protocol records the provenance of AI models, improving transparency and auditability, while also considering automated incentive and reward mechanisms for participants. PDFed aims to empower artists and creators by protecting the privacy of creative works and enabling decentralized, peer-to-peer collaboration. The protocol positively impacts the creative economy by opening up novel revenue streams and fostering innovative ways for artists to benefit from their contributions to the AI space.

System Architecture: client nodes, holding private training/validation data, interact with the federated learning task smart contract deployed on an Ethereum blockchain. The system design allows nodes to participate asynchronously, contributing to model evaluation and selection while retaining autonomy over training strategies and resource usage. The model weights are stored on peer-to-peer solution IPFS, accompanied by C2PA manifests. An example manifest is included on the left - a client node describes a model contribution within PDFed. Highlighted in yellow - C2PA supports specifying ingredient assets; in this case, the IPFS storage CID of the model chosen for further training is included. Green - this designation enables the client node to describe the submitted modelโ€™s training steps. Blue - using the crypto.addresses assertion, the client node specifies their blockchain wallet address, where payments may be sent as a reward for their contributions to model training.

Left: All metrics reported on the baseline experiment evaluated along the training process on the entire CIFAR-10 dataset. Right: a) were trained with the objective ๐ฟ๐น๐ฟ๐ท+๐น๐ผ๐ท and b) were trained with the objective ๐ฟ๐‘„โˆ’๐‘ , for experiments with 2 and 6 client nodes. The scores represented are Q-N score (green), AuthPct (red), FLD (pink), and ๐ถ๐‘‡ (blue). Our proposed training method using ๐ฟ๐‘„โˆ’๐‘ decreases memorization and our proposed Q-N metric is more sensitive to recognizing memorization behavior than all other metrics.

BibTeX

@inproceedings{Balan:PDFed:CVMP:2024,
        AUTHOR = Balan, Kar and Gilbert, Andrew and Collomosse, John",
        TITLE = "PDFed: Privacy-Preserving and Decentralized Asynchronous Federated Learning for Diffusion Models",
        BOOKTITLE = "Conference on Visual Media Production (CVMP'24)",
        YEAR = "2024",
        }