266-final-project

Lightweight LLM Cybersecurity Guardrails

This repository contains the notebooks, utilities and data sets used to generate the initial train/validation/holdout prompts.

benign-prompts: contains the scripts and data used to generate 4000 benign cybersecurity prompts using the methodology in described by the CySecBench Paper

benign-ood-prompts: script and data to build the initial set of out-of-distribution prompts using Databricks Dolly and Code_Alpaca datasets.

notebooks:

Guardrail_Model_Trainer_V4.ipynb - this is the main training model that evaluate each of the 3 BERT models against the train/validation data
Holdout_Set_Evaluator.ipynb - computes accuracy and performace for each of the different holdout categories.
Holdout_Set_Evaluator_threshold.ipynb - Evaluation of the holdout sets using different thresholds. Output shows 0.9, 0.95 evaluation for ModernBERT. Results are added to HoldoutResults folder.
LatencyCalc.ipynb - Since the LlamaGuard/WildGard timings included tokenization, compute tokenization timings for the BERT models so they can be added to the final tallys.
LlamaGuard3.ipynb - training notebook based on Guardrail trainer for LlamaGuard3
Adversarial_Training_Set_Attacker.ipynb - using the V1 DistilBERT model, attack prompts in the train/val using TextFooler and DeepWordBug.
Adversarial_Validator.ipynb - Validate attacked prompts using LLM-as-a-Judge that they retain their original malicous intent
LMSYS_Benign_Validator.ipynb - attempt to collect 13000 prompts benign prompts from the LMSYS dataset using LLM-as-a-Judge for validation
WildGuard.ipynb - evaluation notebook for WildGuard on the holdout data
ModernBERT_and_CySecBERT_Training - initial training runs of the ModernBERT and CySecBERT models
utilities: Prompt generation and combiner scripts used to create datasets on the laptop

Data Files are located at: https://drive.google.com/drive/folders/1UoCgWYPguyHSE-mVj9Ch6DcDBQtI4F3i?usp=sharing

HoldoutResults - contain the data used in the paper
guardrail_model* - v4 and v1 are used for the paper
train_dataset_v4.csv / val_dataset_v4.csv is the final dataset used for the V4 model

Slide Deck Presentation: https://docs.google.com/presentation/d/1E_42ESCS_uALaqoH6uJPdCVYU4ZIKr1X238EW7N3Bj4/edit?slide=id.g3ae1fd30d20_0_21#slide=id.g3ae1fd30d20_0_21

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
benign-ood-prompts		benign-ood-prompts
benign-prompts		benign-prompts
malicous-prompts		malicous-prompts
notebooks		notebooks
utilities		utilities
.gitignore		.gitignore
.python-version		.python-version
Lightweight_Cybersecurty_Guardrails.pdf		Lightweight_Cybersecurty_Guardrails.pdf
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

266-final-project

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

266-final-project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages