By Anish Shivram Gawde, Charles Ciampa, Daniel Alvarez
For our NLP final project, the aim is to measure and understand the bias transfer when performing model distallation on an LLM.
We have split up the code into four folders: data_prep, data, training, models, and analysis.
The data_prep folder contains the code to obtain the log probilities of the labels from the teacher model. The label probalilites are then saved into a csv in the data folder.
The data folder will contain the data use to trained the student models. This is the data saved from the teachers model and used to teach the student models.
The training folder contains the code for the student models which are trained from the data from the teachers model, which can be found inside the data folder. After training the student models, be sure to save the models to the models folder to be later evaluated for bias.
The models folder contains the student models trained from the training folder. This folder is used by the code from the training and analysis folders.
The analysis folder contains the code where we analyze the bias in the student and teacher models. This also has a subfolder for the results from the IMDB testing dataset as well as a graphs folder with a variety of graphs created for the analysis.
The training, models, and analysis folders currently contain temp.txt files in each since they don't contain any code so far. When other files are added to these folders make sure to delete them as they only serve the purpose of giving the repository structure.
This contains instructions on how to run the code for data_prep/data_prep.ipynb (This code gets the label probabilitys from the teacher model and saves it to the data folder) and analysis/teacher_analysis.ipynb (This code creates graphs and analyzes the results from the student models).
There are two methods described for getting the installation prepared. One is using pixi and the other is a manual installation. This installation is only usable for the data_prep/data_prep.ipynb and analysis/teacher_analysis.ipynb code.
To install all the librarys, pixi was used for enviroment handling (https://pixi.sh). Once you have pixi installed, go to parent directory containing this project and run pixi install. If you are using ubuntu this will install all of the librarys automatically.
If you are not able to install using pixi, the required packages are as follows.
- Python 3.11
- NumPy ≥1.26
- Pandas ≥2.2
- Scikit-learn ≥1.7.2
- Matplotlib
- Seaborn
- JupyterLab ≥4.0
- PyTorch ≥2.4.0 (CUDA 12.4)
- Transformers ≥4.38
- tqdm ≥4.65
Feel free to install these packages using whatever enviroment manager you want to use.
Open file, data_prep/data_prep.ipynb. Then run all cells. If the installation was performed correctly with no errors, this should automatically download the training and testing data, as well as download the model. Additionally it will save the results to the data/llama3.1 folder as train.csv and test.csv. Both of these will contain the probability the teacher model gave each of the labels.
Open file, analysis/analysis.ipynb, Then run all cells. If the installation was performed correctly with no errors, this should run all cells without error and additionally save all the graphs into the graphs subdirectory.
Run the file training/train_distilbert_model.py. The script takes the following command line arguments:
--train: accepted values are "True" or "False". If true a new model will be trained. This argument is optional, default value is True. --predict: accepted values are "True" or "False". If true the script will generate predictions. If train is true the new model will be used, otherwise mosted recently trained distilBERT model will be used. This argument is optional, default value is True. --model: accepted values are "baseline" or "distilled". Baseline sets model used for training and predictions as model created using IMDB dataset. Distilled uses/trains model generated with teacher soft-labels. This argument is mandatory.
Predictions are saved in analysis/datasets directory under distilbert_distilled_IMDB.parquet, distilbert_distilled_EEC.parquet, distilbert_trained_IMDB.parquet, and distilbert_trained_EEC.parquet. Newly trained models are saved in models/distilbert_baseline_model and models/distilbert_distilled_model.
All necessary files to be uploaded can be found in the data folder.
- Baseline Bag-of-Words (BoW) Notebook: Baseline BoW model and predictions on the EEC dataset Upload: train.csv (IMDB training data: text, label) test.csv (IMDB test data) 0000.parquet (EEC dataset) Run: Execute all cells. Outputs: vocab_full.pkl imdb_bow_test_predictions.csv eec_predictions_bow_baseline.csv
- Distilled Bag-of-Words (BoW) Notebook: Distilled BoW model and predictions on the EEC dataset Upload: train.csv (must include label_0, label_1) test.csv 0000.parquet vocab_full.pkl Run: Execute all cells. Outputs: imdb_bow_distilled_test_predictions.csv eec_predictions_bow_distilled1.csv
- Baseline CNN Notebook: Baseline CNN and prediction on the EEC dataset Upload: train.csv test.csv 0000.parquet Run: Execute all cells. Outputs: vocab_CNN.pkl imdb_cnn_test_predictions.csv eec_predictions_cnn_baseline.csv
- Distilled CNN Notebook: Distilled CNN model and predictions on the EEC dataset Upload: train.csv (must include label_0, label_1) test.csv 0000.parquet vocab_CNN.pkl Run: Execute all cells. Outputs: imdb_cnn_distilled_test_predictions.csv eec_predictions_cnn_distilled1.csv
- Bias Evaluation Notebook: Bias Evaluation (Kiritchenko & Mohammad, 2018) Upload: eec_predictions_bow_baseline.csv eec_predictions_bow_distilled1.csv eec_predictions_cnn_baseline.csv eec_predictions_cnn_distilled1.csv distilbert_trained_EEC_predictions.parquet distilbert_distilled_EEC_predictions.parquet teacher_model_predictions.parquet Run: Execute all cells. Outputs: Bias statistics table High-resolution plots: gender_bias.png race_bias.png