Anonymize sensitive data in your datasets. It generates pseudonyms for specified columns in a CSV file using a salted SHA-256 hashing function. Integrity checks with HMAC. The script allows for reverting the data back to its original form using encrypted mapping files that are generated during the anonymization process.
- Load or Generate Secret Key (if not exist)
- Key should be base64-encoded 32 bytes (256 bits) long
 
 - Process Input Data File
 - Data Pseudonymization or Reversion
 - Encrypted Mapping Files
- During the 
anonymizeoperation, for each specified column, the script creates an encrypted file that maps the pseudonyms back to the original data. - These mapping files are encrypted using the Fernet symmetric encryption scheme, and an HMAC is appended to ensure data integrity.
 
 - During the 
 - Data Integrity
- When reverting data, the script first checks the integrity of the encrypted mapping files by comparing a stored HMAC with a computed HMAC.
 
 - Output
 
pip3 install -r requirements.txtpython3 anonymizer.py file_path operation --cols column_names --key_path secret_key_path- file_path: Path to the data file (CSV format)
 - operation: 
anonymizeorrevert - --cols: Specific columns to anonymize or revert (all columns by default)
 - --key_path: Path to the secret key file (required)
 
- Generate a data example
 
python3 data.py- Anonymize
 
python3 anonymizer.py data.csv anonymize --key_path secret_key.key- Revert
 
python3 anonymizer.py data.csv revert --key_path secret_key.key- Secret Key Storage - ensure the secret key file is stored securely. If compromised, an attacker could decrypt the pseudonym mappings and de-anonymize the data.
 - Encrypted Mapping Files - ensure that these files are stored in a secure location with restricted access. Access to these files and the secret key allows data de-anonymization.
 


