Skip to content

IFCA-Advanced-Computing/trasgoDP

Repository files navigation

License: Apache 2.0 codecov PyPI Documentation Status Publish Package in PyPI CI/CD Pipeline Code Coverage Python version

TrasgoDP implements different mechanims for ε-differential privacy and (ε, δ)-differential privacy. The mechanisms are implemented for being used under a local approach, adding noise directly to the raw data. Two types of mechanims are implemented:

  • For numerical records: Laplace and Gaussian mechanisms. The implementation includes a final clipping applyied on the data with DP.
  • For categorical records: Exponential mechanism and Randomized Response (both for binary attributes and the k-ary version).

This library provides dedicated function designed for being applied on both pandas dataframes and lists/numpy arrays.

Installation

You can install trasgoDP using pip. We recommend to use Python3 with virtualenv:

virtualenv .venv -p python3
source .venv/bin/activate
pip install trasgoDP

Mechanisms implemented

Mechanism Type of the attribute Function in trasgoDP
Laplace Numerical numerical.dp_clip_laplace()
Gaussian Numerical numerical.dp_clip_gaussian()
Exponential Categorical categorical.dp_exponential()
Randomized response Categorical (binary) categorical.dp_randomized_response_binary()
k-ary randomized response Categorical categorical.dp_randomized_response_kary()

Getting started

For applying DP mechanisms to a column of a dataframe you need to introduce:

  • The pandas dataframe with the data.
  • The column in the dataframe to be privatized.
  • The privacy budget (ε).
  • The probability of exceeding the privacy budget (δ) in case of numerical attributes and the Gaussian mechanism.
  • The uper and lower bounds for numerical attributes (optional).

Example: apply DP to the adult dataset with the Laplace mechanism for the column age and the Exponential mechanism for the column workclass:

import pandas as pd
from trasgodp.numerical import dp_clip_laplace
from trasgodp.categorical import dp_exponential

# Read and process the data
data = pd.read_csv("examples/adult.csv")
data.columns = data.columns.str.strip()
cols = [
    "workclass",
    "education",
    "marital-status",
    "occupation",
    "sex",
    "native-country",
]
for col in cols:
    data[col] = data[col].str.strip()

# Apply DP for the attribute age:
column_num = "age"
epsilon1 = 10
df = dp_clip_laplace(data, column_num, epsilon1, new_column=True)

# Apply DP for the attribute workclass:
column_cat = "workclass"
epsilon2 = 5
df = dp_exponential(data, column_cat, epsilon2, new_column=True)

Warning

This project is under active development.

License

This project is licensed under the Apache 2.0 license.

Related work

If you are using trasgoDP, you may also be interested in:

  • pyCANON: a Python library for checking the level of anonymity of a dataset.
  • anjana: a Python library for anonymizing tabular datasets.

Funding and acknowledgments

This work is funded by European Union through the SIESTA project (Horizon Europe) under Grant number 101131957.