A Python automation script that reads CVE scan requests from scan.json, sends them to a vulnerability service, and saves the results to results folder.
- Reads
scan.jsonfrom a configurable input directory - Generates payloads from templates for different languages/ecosystems
- Sends POST requests to the vulnerability service endpoint
- Saves results to
{scan_id}_{vuln_id}_{iteration}.jsonin a configurable output directory - Extracts data from result files and exports to CSV format
- Analyzes results against expected results and generates confusion matrices
- Archives reports into timestamped tar files
- Supports command-line arguments and environment variables
- Comprehensive error handling and logging
- Google Sheets integration for reading input data and writing analysis results
- Containerized for use in Tekton CI/CD pipelines
- Automated Docker image builds via GitHub Actions with push and manual trigger options
- Python 3.9+
requestslibrarypandaslibrary (for data analysis)openpyxllibrary (for Excel export)scikit-learnlibrary (for confusion matrix calculations)gspreadlibrary (for Google Sheets integration)google-auth-oauthliblibrary (for Google Sheets authentication)
- Install dependencies:
pip install -r requirements.txtBuild the container image locally:
docker build -t quay.io/ecosystem-appeng/auto-cm-testing:latest .The project includes a GitHub Actions workflow that automatically builds and pushes the Docker image to Quay.io. The workflow is located at .github/workflows/build-push.yaml.
Prerequisites:
- GitHub repository secrets configured:
QUAY_ROBOT_USERNAME: Quay.io robot account usernameQUAY_ROBOT_TOKEN: Quay.io robot account token
Triggering the Build:
The workflow can be triggered in two ways:
-
Push with
[build]tag (recommended):git commit -m "[build] Update dependencies" git push origin mainThe workflow will only run if the commit message contains
[build]. -
Manual trigger from GitHub UI:
- Go to the Actions tab in your GitHub repository
- Select "Build Image on Command" workflow
- Click "Run workflow" button
- Click "Run workflow" to start the build
Image Location:
The built image is pushed to: quay.io/ecosystem-appeng/auto-cm-testing:latest
Basic usage with default directories (src/input and src/reports):
python src/vulnerability_main_automation.pySpecify input and output directories:
python src/vulnerability_main_automation.py --input-dir /path/to/input --output-dir /path/to/outputSpecify service URL:
python src/vulnerability_main_automation.py --input-dir /path/to/input --url http://localhost:26466/generateUsing environment variables:
export SERVICE_URL=http://localhost:26466/generate
python src/vulnerability_main_automation.py --input-dir /path/to/input--input-dir: Input directory containingscan.json(default:src/inputorINPUT_DIRenv var)--output-dir: Output directory for result JSON files (default:src/reportsorOUTPUT_DIRenv var)--url: Service URL endpoint (default:http://localhost:26466/generateorSERVICE_URLenv var)--timeout: Request timeout in seconds (default: 1800 = 30 minutes orTIMEOUTenv var)--log-level: Logging level:DEBUG,INFO,WARNING,ERROR,CRITICAL(default:DEBUGorLOG_LEVELenv var)--language: Filter tests by language (e.g.,c,go,python). If not specified, runs all languages (default:NoneorLANGUAGEenv var)--gsheets-mode: Google Sheets integration mode:none(disabled),input(read from sheets),output(write to sheets), orboth(default:noneorGSHEETS_MODEenv var)--gsheets-input-sheet-id: Google Sheets ID for reading input data (required ifgsheets-modeisinputorboth)--gsheets-output-sheet-id: Google Sheets ID for writing results (required ifgsheets-modeisoutputorboth)--gsheets-service-account-file: Path to Google service account JSON file (required ifgsheets-modeis notnone)--gsheets-tag: Tag/label for the run (optional, for future use)
The script expects:
- Input:
scan.json(orscan_generated.jsonif generated from Google Sheets) in the input directory - Output: Result JSON files named
{scan_id}_{vuln_id}_{iteration}.jsonin the output directory - Config: Configuration files in
src/config/directory:analysis_config.json: Scanner label mapping and exploitability matrix (see Label System Reference for details)prodsec_expected_results_*.csv: Expected results for each ecosystemsboms/*.sbom: SBOM files referenced in scan configuration
- Templates: Payload templates in
src/templates/directory (e.g.,c_payload_template.json,go_payload_template.json)
Example scan.json structure:
{
"iterations": 1,
"tests": [
{
"language": "c",
"vuln_id": "CVE-2025-1094",
"image": {
"name": "registry.redhat.io/rhel8/postgresql-13",
"tag": "1-196.1724180180"
},
"git": {
"repo": "https://github.com/postgres/postgres",
"ref": "REL_13_14"
},
"use_sbom": true,
"sbom_file": "sboms/postgresql-13-1-196.1724180180.sbom"
}
]
}The script performs the following steps:
-
Input Generation (if Google Sheets input mode is enabled):
- Reads test data from Google Sheets tabs (
C_SheetandGo_Sheet) - Writes data to CSV files (
prodsec_expected_results_*.csv) to synchronize with Google Sheets - Generates
scan_generated.jsonin the input directory
- Reads test data from Google Sheets tabs (
-
Execution:
- Reads scan configuration from
scan.json(orscan_generated.json) - Generates payloads from templates for each test
- Sends POST requests to the vulnerability service
- Saves results as
{scan_id}_{vuln_id}_{iteration}.jsonfiles
- Reads scan configuration from
-
Data Extraction:
- Extracts key metrics from successful result JSON files
- Exports extracted data to
extracted_data.csv
-
Analysis:
- Compares extracted data against expected results
- Generates confusion matrices (categorical and binary)
- Calculates performance metrics (Accuracy, Precision, Recall, F1 Score)
- Exports analysis reports to console and Excel (
merged_data.csv) - Writes results to Google Sheets (if output mode is enabled)
-
Cleanup:
- Archives all result files into timestamped tar files
- Moves archives to
archive/subdirectory - Deletes original result files from reports directory
Run the container with mounted volumes:
docker run --rm \
-v /path/to/input:/app/input:ro \
-v /path/to/output:/app/output \
-e SERVICE_URL=http://service:8080/generate \
quay.io/ecosystem-appeng/auto-cm-testing:latest \
--input-dir /app/input \
--output-dir /app/outputNote: The image is automatically built and pushed to Quay.io via GitHub Actions. See the Automated Build with GitHub Actions section for details.
The script supports reading input data from Google Sheets and writing analysis results to Google Sheets.
A Service Account is a special "robot" user that allows your application to access Google Sheets without requiring human authentication. Follow these steps to set it up:
Step 1: Create a Google Cloud Project
- Go to the Google Cloud Console
- Create a new project (or select an existing one)
- Note your project name for reference
Step 2: Enable the Google Sheets API
- In your project, navigate to "APIs & Services" > "Library"
- Search for "Google Sheets API" and click on it
- Click the "Enable" button
- Also search for and enable "Google Drive API" (required for file access)
Step 3: Create the Service Account
- Go to "APIs & Services" > "Credentials"
- Click "Create Credentials" and select "Service account"
- Give your service account a name (e.g.,
vulnerability-automation) and a description - Click "Create and Continue"
- You can skip "Grant this service account access to project" (optional) and click "Continue"
- Skip "Grant users access to this service account" and click "Done"
Step 4: Get Your Credentials (the JSON Key File)
-
You'll be back on the "Credentials" page. Find the service account you just created and click on it
-
Go to the "Keys" tab
-
Click "Add Key" > "Create new key"
-
Select "JSON" as the key type and click "Create"
-
A
.jsonfile will download to your computer⚠️ Security Warning: Treat this file like a password! Do not share it publicly or commit it to version control. This file contains the private key your app will use to authenticate.
Step 5: Share Your Google Sheet with the Service Account
- Open the
.jsonfile you just downloaded in a text editor - Find the
client_emailfield. It will look something like:vulnerability-automation@my-project-12345.iam.gserviceaccount.com - Copy this email address
- Open your Google Sheet (the one you want the script to access)
- Click the green "Share" button in the top-right corner
- Paste the
client_emailinto the "Add people and groups" box - Give it the appropriate permission:
- Viewer: If your app only needs to read data (input mode only)
- Editor: If your app needs to read and write data (output mode or both modes)
- Click "Share" (you can uncheck "Notify people" since this is a robot account)
Step 6: Get Your Google Sheet ID
- Open your Google Sheet in a web browser
- Look at the URL. It will look like:
https://docs.google.com/spreadsheets/d/1abc2def3ghi4jkl5mno6pqr/edit - The Sheet ID is the long string between
/d/and/edit:1abc2def3ghi4jkl5mno6pqr - Copy this ID - you'll need it for the
GSHEETS_INPUT_SHEET_IDorGSHEETS_OUTPUT_SHEET_IDenvironment variables
Step 7: Configure Environment Variables
Set up the following environment variables:
# Path to the downloaded service account JSON file
export GOOGLE_SERVICE_ACCOUNT_FILE=/path/to/service-account.json
# Google Sheets integration mode: 'none', 'input', 'output', or 'both'
export GSHEETS_MODE=both
# Google Sheet IDs (get from the sheet URL)
export GSHEETS_INPUT_SHEET_ID=your_input_sheet_id
export GSHEETS_OUTPUT_SHEET_ID=your_output_sheet_id
# Optional: Tag/label for the run
export GSHEETS_TAG="production-run-2025"Security Best Practices:
- Store the service account JSON file in a secure location
- Add the JSON file to
.gitignoreto prevent accidental commits - Use environment variables or secrets management (e.g., Kubernetes secrets) in production
- Regularly rotate service account keys if compromised
- Use the principle of least privilege: only grant the minimum permissions needed (Viewer for read-only, Editor for read-write)
When gsheets-mode is set to input or both, the script will:
- Read test data from Google Sheets tabs (
C_SheetandGo_Sheet) - Write data to CSV files (
prodsec_expected_results_*.csv) to ensure CSV files are synchronized with Google Sheets (source of truth) - Generate
scan_generated.jsonin the input directory from the Google Sheets data
Input Sheet Format: The input sheets should have the following columns:
vuln_id: CVE identifier (e.g.,CVE-2024-5535)truth_category: Expected category (ExploitableorNot Exploitable)truth_label: Expected label (e.g.,TP-1,TN-5,TN). See Label System Reference for a complete list of labels and their meanings.image_name: Container image name (e.g.,registry.redhat.io/openshift4/ose-hypershift-rhel9)image_tag: Container image tag (e.g.,v4.17.0-202503101435)image_digest: Container image digest (e.g.,sha256:4b522848b679798e9a5d309089555b27d0e997398ae3efe91050916964bd6c55)source_location: Git repository URL (e.g.,https://github.com/postgres/postgres)commit_id: Git commit ID or ref (e.g.,REL_13_14)use_sbom: Boolean flag indicating whether to use SBOM (trueorfalse)sbom_file: Path to SBOM file relative toconfig/sboms/directory (e.g.,postgresql-13-1-196.1724180180.sbom)
Note: All required data (including git repository, commit ID, and SBOM information) must be present in the Google Sheets. The script does not extract this information from container images, so no registry credentials or skopeo are required.
Example:
python src/vulnerability_main_automation.py \
--gsheets-mode input \
--gsheets-input-sheet-id 1abc2def3ghi4jkl5mno6pqr \
--gsheets-service-account-file /path/to/service-account.json \
--url http://localhost:26466/generateWhen gsheets-mode is set to output or both, the script will:
- Write analysis results (confusion matrix metrics) to the
rawtab in the output sheet - Include per-ecosystem metrics with timestamps and tags
Output Sheet Format:
The output sheet (raw tab) will contain the following columns:
date: Timestamp of the analysis runecosystem: Ecosystem name (e.g.,c,go)tag: User-provided tag/labelAccuracy,Precision,Recall,F1_Score: Performance metricsTP,TN,FP,FN: Confusion matrix valuesTotal: Total number of test cases
Example:
python src/vulnerability_main_automation.py \
--gsheets-mode output \
--gsheets-output-sheet-id 1abc2def3ghi4jkl5mno6pqr \
--gsheets-service-account-file /path/to/service-account.json \
--gsheets-tag "v1.0.0"To use both input and output modes:
python src/vulnerability_main_automation.py \
--gsheets-mode both \
--gsheets-input-sheet-id 1abc2def3ghi4jkl5mno6pqr \
--gsheets-output-sheet-id 7stu8vwx9yza0bcd1efg2hij \
--gsheets-service-account-file /path/to/service-account.json \
--gsheets-tag "production-run-2025"The script handles various error conditions:
- Missing input files
- Invalid JSON in input or response
- HTTP errors (4xx, 5xx)
- Connection errors
- Timeouts
- File I/O errors
- Google Sheets API errors
- Missing service account credentials
- Token expiration (automatically handled by refreshing credentials)
- Stale connection issues (automatically handled by resetting spreadsheet references)
All errors are logged with appropriate detail levels.
The script uses Python's logging module with INFO level by default. Logs include:
- File operations (read/write)
- HTTP request details
- Error messages with context
- Success confirmations
- Google Sheets operations (read/write)
- Token refresh operations
The script generates several output files:
- Result JSON files:
{scan_id}_{vuln_id}_{iteration}.json- Individual scan results - Extracted data CSV:
extracted_data.csv- Extracted metrics from all result files - Merged data CSV:
merged_data.csv- Analysis results with confusion matrix metrics - Archived reports:
archive/report_{timestamp}.tar- Timestamped archives of result files
The container is based on registry.redhat.io/ubi9/python-39, which provides:
- Red Hat Universal Base Image 9
- Python 3.9
- Security updates and support
This script is provided as-is for vulnerability automation testing.