Skip to content

Prototype cell key perturbation to enhance disclosure control against difference attacks #251

@tombisho

Description

@tombisho

Stefan has illustrated two methods of retrieving data from DataSHIELD with difference attacks. In short:

(1) by comparing the mean of a column with all rows and with one row removed
(2) by comparing the mean of a column with all rows and with one row duplicated

This is hard to protect against because it is done by creating two subsets that generally have large numbers of rows.

Research indicates that the best protection against difference attacks is to add noise. There is a package cellKey which provides the ability to add noise to a table in R. This could be repackaged for DataSHIELD use.

The issues to address are:

  • when to apply the noise - on import of data into the session? Or when the data are split into subsets?
  • the cell key process has been used for census data, and tends to be evaluated on a particular data set to see if it is appropriate. How would that work for DataSHIELD with diverse datasets?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions