Skip to content

emowat/266-final-project

Repository files navigation

266-final-project

Lightweight LLM Cybersecurity Guardrails

This repository contains the notebooks, utilities and data sets used to generate the initial train/validation/holdout prompts.

benign-prompts: contains the scripts and data used to generate 4000 benign cybersecurity prompts using the methodology in described by the CySecBench Paper

benign-ood-prompts: script and data to build the initial set of out-of-distribution prompts using Databricks Dolly and Code_Alpaca datasets.

notebooks:

  • Guardrail_Model_Trainer_V4.ipynb - this is the main training model that evaluate each of the 3 BERT models against the train/validation data
  • Holdout_Set_Evaluator.ipynb - computes accuracy and performace for each of the different holdout categories.
  • Holdout_Set_Evaluator_threshold.ipynb - Evaluation of the holdout sets using different thresholds. Output shows 0.9, 0.95 evaluation for ModernBERT. Results are added to HoldoutResults folder.
  • LatencyCalc.ipynb - Since the LlamaGuard/WildGard timings included tokenization, compute tokenization timings for the BERT models so they can be added to the final tallys.
  • LlamaGuard3.ipynb - training notebook based on Guardrail trainer for LlamaGuard3
  • Adversarial_Training_Set_Attacker.ipynb - using the V1 DistilBERT model, attack prompts in the train/val using TextFooler and DeepWordBug.
  • Adversarial_Validator.ipynb - Validate attacked prompts using LLM-as-a-Judge that they retain their original malicous intent
  • LMSYS_Benign_Validator.ipynb - attempt to collect 13000 prompts benign prompts from the LMSYS dataset using LLM-as-a-Judge for validation
  • WildGuard.ipynb - evaluation notebook for WildGuard on the holdout data
  • ModernBERT_and_CySecBERT_Training - initial training runs of the ModernBERT and CySecBERT models
  • utilities: Prompt generation and combiner scripts used to create datasets on the laptop

Data Files are located at: https://drive.google.com/drive/folders/1UoCgWYPguyHSE-mVj9Ch6DcDBQtI4F3i?usp=sharing

  • HoldoutResults - contain the data used in the paper
  • guardrail_model* - v4 and v1 are used for the paper
  • train_dataset_v4.csv / val_dataset_v4.csv is the final dataset used for the V4 model

Slide Deck Presentation: https://docs.google.com/presentation/d/1E_42ESCS_uALaqoH6uJPdCVYU4ZIKr1X238EW7N3Bj4/edit?slide=id.g3ae1fd30d20_0_21#slide=id.g3ae1fd30d20_0_21

About

Lightweight LLM Cybersecurity Guardrails

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors