GeoGuard AI is a prototype system that utilizes the power of Large Language Models to automatically flag software features that require geo-specific legal compliance. It transforms regulatory detection from a manual, error-prone process into a traceable, auditable, and proactive workflow.
Live Demo: https://geoguard-t4.streamlit.app/
Demo Video: [Link to your YouTube Demo Video]
- Problem Statement
- Our Solution
- Key Features & Functionality
- Tech Stack & Assets
- Local Setup & Usage
- Running Scripts
- Project Structure
As modern tech companies operate globally, every product feature must dynamically satisfy dozens of geographic regulations – from Brazil's data localization to GDPR. It is crucial to have automated visibility into key questions such as: "Does this feature require dedicated logic to comply with region-specific legal obligations?"
Without it, potential risks include legal exposure from undetected compliance gaps, reactive firefighting when auditors inquire, and massive manual overhead in scaling global feature rollouts. The challenge is to build a prototype system that utilizes LLM capabilities to flag these features, turning regulatory detection from a blind spot into a traceable, auditable output.
GeoGuard AI is an interactive web application designed for product managers and legal teams. It provides a centralized dashboard to manage software features and their compliance status.
Instead of relying on manual reviews, a user can input their feature's documentation (Title, Description, PRD, TRD), and our system leverages the Google Gemini model to perform an instant analysis. The AI, augmented with a curated and dynamically manageable knowledge base, determines if the feature requires geo-specific logic, provides clear reasoning, and cites the relevant law.
Crucially, every scan is saved as an immutable snapshot to a cloud-hosted Supabase database, creating a persistent, auditable history for each feature. This allows teams to track how a feature's compliance needs evolve and provides a clear evidence trail for regulatory inquiries.
- AI-Powered Compliance Analysis: Utilizes the Google Gemini model with a sophisticated prompt and a "Simplified RAG" approach to analyze feature artifacts.
- Centralized Cloud Database: All features, scans, legal rules, and terminology are stored in a robust, cloud-hosted Supabase PostgreSQL database.
- Dynamic Knowledge Base Management: A built-in settings page allows administrators to add, edit, and delete legal rules and internal terminology directly from the UI, keeping the AI's knowledge base up-to-date without code changes.
- Interactive Feature Dashboard: A user-friendly Streamlit interface allows users to create, search, filter, and bulk-manage features.
- Batch Feature Upload: Efficiently import multiple features at once from a CSV file.
- Immutable Scan Snapshots: When a scan is performed, the system saves a complete snapshot of the feature's text at that moment, ensuring the audit trail is accurate.
- Objective Performance Evaluation: Includes a standalone script (
evaluate.py) to test the LLM's accuracy against a ground-truth dataset, enabling data-driven improvements.
- Language: Python 3.12
- Virtual Environment:
venv - Containerization: Docker
- Version Control: Git & GitHub
- Google Gemini API: The core of our analysis engine, using the
gemini-2.0-flashmodel. - Supabase: Cloud-hosted PostgreSQL database and backend for all application data.
streamlit: For building the interactive web application UI.supabase: The official Python client for interacting with the Supabase database.google-generativeai: The official Python SDK for the Gemini API.pandas: Used for CSV processing (batch uploads) and in our evaluation script.python-dotenv: For managing environment variables securely.scikit-learn: Used in our evaluation script to generate a comprehensive classification report.
You can run this application on your local machine by following these steps.
- Clone the repository:
git clone <your-repo-url> cd new-geoguard
-
Set up Environment Variables:
- Create a file named
.envby copying the template:cp .env.template .env - Edit the
.envfile and add your API keys and database credentials.
- Create a file named
-
Build and start the container:
docker-compose up --build -d
-
Access the application at http://localhost:8501
-
To stop the application:
docker-compose down
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
⚠️ Set up your Environment Variables:- Create a file named
.envin the root directory. - Add your credentials to this file in the following format:
# .env GEMINI_API_KEY="your-gemini-api-key" SUPABASE_URL="https://your-project-ref.supabase.co" SUPABASE_KEY="your-supabase-service-role-key"
- The app will not run without these variables.
- Create a file named
-
Run the application:
streamlit run app.py
Your browser should open with the application running.
The project includes scripts for evaluation and data generation.
To objectively measure the performance of the AI model against our test dataset, run the evaluation script from your terminal:
python evaluate.pyThis will output a full classification report and a confusion matrix.
To run the AI analysis on a sample dataset (sample-dataset/sample_data.csv) and generate a results file, use:
python generate_results.pyThe output will be saved to sample-dataset/sample_data_results.csv.
new-geoguard/
├── .env.template # Template for environment variables
├── Dockerfile
├── docker-compose.yml
├── README.md
├── app.py # Main Streamlit application
├── evaluate.py # Standalone script for testing the LLM
├── generate_results.py # Script to generate results from sample data
├── requirements.txt
├── data/
│ └── test_data.csv # Ground truth for the evaluation script
├── sample-dataset/
│ ├── sample_data.csv # Sample input data for generation
│ └── sample_data_results.csv # Generated output
└── src/
├── __init__.py
├── ai_core.py # All LLM-related logic (prompting, Gemini calls)
└── db_utils.py # Functions for interacting