FLAIM: AIM-based synthetic data generation in the federated setting

Code for the paper 'FLAIM: AIM-based synthetic data generation in the federated setting'

Citing this work

@inproceedings{flaim,
author = {Maddock, Samuel and Cormode, Graham and Maple, Carsten},
title = {FLAIM: AIM-based Synthetic Data Generation in the Federated Setting},
year = {2024},
isbn = {9798400704901},
url = {https://doi.org/10.1145/3637528.3671990},
doi = {10.1145/3637528.3671990},
booktitle = {Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
pages = {2165–2176},
numpages = {12},
location = {Barcelona, Spain},
series = {KDD '24}
}

Installation Instructions

Install the required Python environment via conda and pip

conda create -n "flaim" python=3.9 
conda activate flaim
pip install -r ./requirements.txt

AutoDP Dependency: Download the repo here and run in the root of autodp.

pip install .

Datasets

All datasets are downloaded automatically via PMLB or the Synthetic Data Vault during the first run except for the Adult, Covtype and Marketing datasets which require manual downloading:

Download 'adult.csv' from here and place under /synth_fl/data/
Download 'covtype.csv' from here and place under /synth_fl/data/
Download 'marketing.csv' from here and place under synth_fl/data/

Caching client partitions

In order to run experiments the partitions of client data into the federated setting must be formed. The partitions are generated and stored under synth_fl/data

Run the following Python code to produce non-IID partitions of the benchmark datasets:

python3.9 launcher.py --sweep-name paper/cache_answers/cache_split_answers1.json --sweep-manager-type local --sweep-backend local

and to produce the SynthFS synthetic dataset:

python3.9 launcher.py --sweep-name paper/cache_answers/cache_split_answers2.json --sweep-manager-type local --sweep-backend local

Note this may take a while (around ~10-20 minutes) and will save client partition splits to synth_fl/data.

Replication Instructions

Further configs for experiments are contained within sweep_configs/paper/.

These can be run locally (over 4 CPU threads) as follows:

python3.9 launcher.py --sweep-backend local --sweep-manager-type local --sweep-name paper/SWEEP_NAME --workers 4

Data from experiments is saved under slurm/job_results and slurm/sweep_results.

SWEEP_NAME should be one of the following config files contained within sweep_configs/paper:

varying_eps - Used to produce Figure 3(a) in the main paper and Figure 6 in the Appendix.
varying_feature_skew - Used to produce Figure 1.
varying_local_rounds - Used to produce Figures 3(e,f) in the main paper and Figure 10 in the Appendix.
varying_p - Used to produce Figure 3(c) in the main paper and Figure 8 in the Appemndix.
varying_t - Used to produce Figures 3(b) and Table 1 in the main paper and Figure 7 and Table 6 in the Appendix.
varying_beta - prodcues Figures 3(d) in the main paper and Figure 9 in the Appendix.
baselines.json - Will train FLAIM baseline methods, used to produce Table 1
communication_tracking - Used to produce Table 3 in the main paper and Table 7 in the Appendix.
appendix_non_iid_split - Used to produce Table 5 in the Appendix.

Acknowledgements

We would like to acknowledge the following code that is used by this repo:

PMLB - For dataset loading
Synthetic data vault - For additional datasets
Private-PGM by Ryan McKenna
AutoDP by Yu-Xiang Wang
FLSim by Facebook Research (for running federated CTGAN examples)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
slurm/checkpoints		slurm/checkpoints
sweep_configs		sweep_configs
synth_fl		synth_fl
.gitignore		.gitignore
README.md		README.md
extract_sweep.py		extract_sweep.py
launcher.py		launcher.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FLAIM: AIM-based synthetic data generation in the federated setting

Citing this work

Installation Instructions

Datasets

Caching client partitions

Replication Instructions

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Samuel-Maddock/flaim

Folders and files

Latest commit

History

Repository files navigation

FLAIM: AIM-based synthetic data generation in the federated setting

Citing this work

Installation Instructions

Datasets

Caching client partitions

Replication Instructions

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages