This project consists of a complete data processing pipeline using NATS JetStream for messaging between microservices, with support for both local Docker Compose deployment and cloud-native Kubernetes deployment on Google Kubernetes Engine (GKE).
Note: This repository uses Git submodules. Clone with
git clone --recurse-submodules https://github.com/richardr1126/k8s-datacenter-project.gitor rungit submodule update --init --recursiveif already cloned. See Submodule Repositories below for links to individual service repositories.
The project includes a cost-optimized GKE cluster manager for production deployments:
- Cluster Type: Single zonal cluster (us-central1-b)
- Node Pools:
- Default Pool: t2d-standard-2 (2 vCPUs, 8GB RAM) for general workloads
- ML Pool: n2d-highcpu-4 (4 vCPUs, 4GB RAM) for sentiment analysis workloads
- Cost Optimization: Spot instances (60-91% savings), private nodes with Cloud NAT
- Estimated Cost: ~$40-67/month with spot instances
- NATS Server: Message broker with JetStream persistence
- NATS Firehose Ingest: Ingests data from external sources into NATS streams
- NATS Stream Processor: Processes messages with sentiment analysis
- Mock Ingest: Generates test data for development
- Sentiment Web UI: Real-time visualization dashboard (Next.js)
- bsky-sentiment-web - Real-time visualization dashboard (Next.js)
- gke-cluster - GKE cluster management and Helm charts
- nats-firehose-ingest - Firehose data ingestion service
- nats-stream-processor - Sentiment analysis processor
docker compose up --builddocker compose --profile mock up --builddocker compose --profile tools up --builddocker compose --profile mock --profile tools up --build- NATS Server:
- Client:
localhost:4222 - HTTP Monitoring:
localhost:8222 - Cluster:
localhost:6222
- Client:
- Firehose Ingest:
http://localhost:8081- Health:
http://localhost:8081/health - Metrics:
http://localhost:8081/metrics
- Health:
- Stream Processor 0:
http://localhost:8082- Health:
http://localhost:8082/health - Metrics:
http://localhost:8082/metrics
- Health:
- Stream Processor 1:
http://localhost:8083- Health:
http://localhost:8083/health - Metrics:
http://localhost:8083/metrics
- Health:
- Sentiment Web UI:
http://localhost:3000
curl http://localhost:8081/health # Firehose Ingest
curl http://localhost:8082/health # Stream Processor 0
curl http://localhost:8083/health # Stream Processor 1curl http://localhost:8081/metrics # Firehose Ingest metrics
curl http://localhost:8082/metrics # Stream Processor 0 metrics
curl http://localhost:8083/metrics # Stream Processor 1 metrics- Web UI: http://localhost:8222
- CLI access:
docker compose exec nats-box nats
# Start with tools profile
docker compose --profile tools up -d
# Connect to NATS box
docker compose exec nats-box sh
# Inside the container, you can use NATS CLI:
nats stream list
nats stream info bluesky-posts-dev
nats stream info bluesky-posts-enriched-dev
nats sub "bluesky.posts.dev.>"
nats sub "bluesky.enriched.dev.>"- Mock Ingest → publishes test messages to
bluesky-posts-devstream with subjectbluesky.posts.dev.* - Firehose Ingest → consumes from external firehose and publishes to
bluesky-posts-devstream with subjectbluesky.posts.dev.* - Stream Processors (0 & 1) → consume from
bluesky-posts-devstream (load-balanced), add sentiment/topic analysis, publish tobluesky-posts-enriched-devstream with subjectbluesky.enriched.dev.* - Sentiment Web UI → subscribes to enriched stream and displays real-time updates
Each service can still be developed independently using their local docker-compose files:
# Firehose Ingest only
cd nats-firehose-ingest
docker compose up --build
# Stream Processor only
cd nats-stream-processor
docker compose up --build
# Sentiment Web UI only
cd bsky-sentiment-web
npm install
npm run dev# View all logs
docker compose logs -f
# View specific service logs
docker compose logs -f nats-firehose-ingest
docker compose logs -f nats-stream-processor-0
docker compose logs -f nats-stream-processor-1
docker compose logs -f mock-ingest
docker compose logs -f bsky-sentiment-web# Stop all services
docker compose down
# Remove volumes (clears data)
docker compose down -v
# Remove images
docker compose down --rmi all-
Install Google Cloud SDK and authenticate:
gcloud auth login gcloud auth application-default login gcloud config set project YOUR_PROJECT_ID -
Enable required APIs:
gcloud services enable container.googleapis.com compute.googleapis.com -
Install uv package manager and Python dependencies:
# uv is the recommended way to install dependencies curl -LsSf https://astral.sh/uv/install.sh | sh # Install dependencies and run uv sync --frozen source .venv/bin/activate
-
Create
.env.prodfiles for each microservice:IMPORTANT: Before running
./create-all.shor deploying with Helm, you must create.env.prodfiles in each microservice directory. These files contain environment variables that will be loaded into Kubernetes secrets.# Create .env.prod in each service directory: cp nats-firehose-ingest/.env.example nats-firehose-ingest/.env.prod # Edit with your values cp nats-stream-processor/.env.example nats-stream-processor/.env.prod # Edit with your values cp bsky-sentiment-web/.env.example bsky-sentiment-web/.env.prod # Edit with your values
Environment file comparisons:
nats-firehose-ingest:
Variable .env.example .env.prod Notes NATS_STREAM_NUM_REPLICAS13Increased for HA in production All others Same as .env.prod See .env.example Same for dev and prod nats-stream-processor:
Variable .env.example .env.prod Notes NUM_STREAM_REPLICAS13Increased for HA in production SENTIMENT_MODEL_CACHE_DIR./models/sentiment/var/cache/models/sentimentAbsolute path for containerized env TOPIC_MODEL_CACHE_DIR./models/topics/var/cache/models/topicsAbsolute path for containerized env SENTIMENT_CONFIDENCE_THRESHOLD0.40.3Lower threshold in production for more detections All others Same as .env.prod See .env.example Same for dev and prod bsky-sentiment-web:
Variable .env.example .env.prod Notes NATS_URLExamples provided nats://nats.nats.svc.cluster.local:4222K8s DNS for production OUTPUT_STREAMExamples provided bluesky-posts-enrichedMatch processor output stream OUTPUT_SUBJECTExamples provided bluesky.enrichedMatch processor output subject If these files are missing,
./create-all.shand the individual./create-secrets.shscripts will fail.
Before running: Ensure all .env.prod files are created in each microservice directory (see Prerequisites).
Deploy the entire pipeline with a single command:
# Run the complete setup
./create-all.shThis script automatically:
- Creates the GKE cluster with Cloud NAT setup
- Connects to the cluster
- Installs all Kubernetes applications (NATS, Prometheus, CockroachDB, etc.)
- Creates Kubernetes secrets from
.env.prodfiles in each microservice - Deploys all microservices (firehose ingest, stream processor, sentiment web UI)
cd gke-cluster
# Create cluster with default name and spot instances
uv run gke-cluster.py create
# Create cluster with custom name
uv run gke-cluster.py create --name my-cluster
# Create cluster without spot instances (more expensive but more reliable)
uv run gke-cluster.py create --no-spotgcloud container clusters get-credentials cost-optimized-cluster \
--zone us-central1-b --project YOUR_PROJECT_ID
# Verify nodes
kubectl get nodes
# Check node pools
kubectl get nodes --show-labels | grep poolAfter your cluster is running, deploy the services using Helm charts:
cd gke-cluster/helm
# Run installation script
# 1. Installs cert-manager, NGINX Gateway Fabric, and cluster issuers
# 2. Installs kube-prometheus-stack w/ grafana for monitoring
# 3. Installs prometheus-adapter for custom metrics in K8s HPA
# 4. Installs NATS cluster with JetStream
# 5. Installs CockroachDB for persistent storage
./install-apps.shFor each service, create secrets and deploy using Helm:
Firehose Ingest:
cd nats-firehose-ingest/charts
./create-secrets.sh
helm upgrade --install nats-firehose-ingest ./nats-firehose-ingest \
--set image.tag=latestStream Processor:
cd nats-stream-processor/charts
./create-secrets.sh
helm upgrade --install nats-stream-processor ./nats-stream-processor \
--set image.tag=latestSentiment Web UI:
cd bsky-sentiment-web/charts
./create-secrets.sh
helm upgrade --install bsky-sentiment-web ./bsky-sentiment-web \
--set image.tag=latest# List all releases
helm list --all-namespaces
# Check release status
helm status nats-firehose-ingest --namespace default
# View release values
helm get values nats-firehose-ingest --namespace default
# Upgrade a release
helm upgrade nats-firehose-ingest ./nats-firehose-ingest \
--namespace default \
--values ./nats-firehose-ingest/values.yaml
# Rollback a release
helm rollback nats-firehose-ingest 1 --namespace default
# Uninstall a release
helm uninstall nats-firehose-ingest --namespace default
# Dry-run to see what would be deployed
helm upgrade --install nats-firehose-ingest ./nats-firehose-ingest \
--namespace default \
--dry-run --debugEach microservice has a create-secrets.sh script in its charts/ directory. These scripts read from .env.prod files and create Kubernetes secrets.
REQUIRED: Each microservice must have a
.env.prodfile in its root directory for the secrets script to work.
# Basic usage (uses .env.prod by default)
cd nats-firehose-ingest/charts
bash create-secrets.sh
# Specify custom namespace
bash create-secrets.sh --namespace my-namespace
# Specify custom secret name
bash create-secrets.sh --secret-name my-secret
# Specify custom .env file
bash create-secrets.sh --env-file /path/to/.env
# Dry-run to see what would be created
bash create-secrets.sh --dry-run
# Combine options
bash create-secrets.sh \
--namespace production \
--secret-name app-secrets \
--env-file .env.production \
--dry-runExample for each microservice:
# Firehose Ingest
cd nats-firehose-ingest/charts
bash create-secrets.sh # requires nats-firehose-ingest/.env.prod
# Stream Processor
cd nats-stream-processor/charts
bash create-secrets.sh # requires nats-stream-processor/.env.prod
# Sentiment Web UI
cd bsky-sentiment-web/charts
bash create-secrets.sh # requires bsky-sentiment-web/.env.prod# Scale all pools to 5 nodes each
uv run gke-cluster.py scale --name cost-optimized-cluster --nodes 5
# Scale down to 0 nodes (save money, only pay for control plane)
uv run gke-cluster.py scale --nodes 0
# Scale specific pool only
uv run gke-cluster.py scale --name cost-optimized-cluster --nodes 3 --pool ml-pool# Delete cluster and all associated resources (PVCs, Cloud NAT, Cloud Router)
uv run gke-cluster.py delete
# Delete specific cluster
uv run gke-cluster.py delete --name my-clusteruv run gke-cluster.py listServices are deployed via Helm charts in the gke-cluster/helm/ directory:
- NATS cluster with JetStream
- Kube-Prometheus stack for monitoring
- CockroachDB for persistent storage
- Individual service deployments with auto-scaling
# Connect to NATS box in the cluster
kubectl exec -it deployment/nats-box -n nats -- sh
# Inside the container, you can use NATS CLI:
nats stream list
nats stream info bluesky-posts
nats stream info bluesky-posts-enriched
nats sub "bluesky.posts.>"
nats sub "bluesky.enriched.>"- Mock Ingest → publishes test messages to
bluesky-postsstream with subjectbluesky.posts.* - Firehose Ingest → consumes from external firehose and publishes to
bluesky-postsstream with subjectbluesky.posts.* - Stream Processors (0 & 1) → consume from
bluesky-postsstream (load-balanced), add sentiment/topic analysis, publish tobluesky-posts-enrichedstream with subjectbluesky.enriched.* - Sentiment Web UI → subscribes to enriched stream and displays real-time updates
- Machine Type: t2d-standard-2 (2 vCPUs, 8GB RAM)
- Purpose: General workloads
- Disk: 20GB standard persistent disk
- Autoscaling: Manual (use scale command)
- Machine Type: n2d-highcpu-4 (4 vCPUs, 4GB RAM)
- Purpose: ML inference workloads (optimized for ONNX INT8 models)
- Disk: 50GB standard persistent disk
- Taint:
dedicated=ml:NoSchedule - Autoscaling: Enabled (0-6 nodes)
- GCFS: Enabled for image streaming
To deploy to ML pool, add toleration and node selector:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "ml"
effect: "NoSchedule"
nodeSelector:
cloud.google.com/gke-nodepool: ml-poolThe cluster uses private nodes to save on external IP quota:
- Private Nodes: Nodes only have private IPs (no external IPs)
- Cloud NAT: Provides outbound internet access for:
- Pulling container images
- Accessing external APIs
- Downloading packages
- Security: Nodes are unreachable from the internet
- Cost: Adds ~$1-5/month for NAT
Benefits:
- ✅ No external IP quota needed (only LoadBalancer needs 1 external IP)
- ✅ Reduced attack surface
- ✅ All outbound traffic through NAT gateway
- ✅ Can scale to 9 nodes without external IP quota issues
- Savings: 60-91% off regular instance prices
- Trade-off: Can be terminated with 30-second notice
- Best for: Development, testing, fault-tolerant workloads
- Default Pool (t2d-standard-2): 2 vCPUs, 8GB RAM
- ML Pool (n2d-highcpu-4): 4 vCPUs, 4GB RAM (compute-optimized)
- Total capacity: 18 vCPUs, 48GB RAM at full scale (3+6 nodes)
- Standard Persistent Disk: Cheapest disk option
- Sizes: 20GB default, 50GB for ML nodes
- Performance: Suitable for most development workloads
- PVC Disks: Automatically deleted with cluster
- Cloud NAT: Automatically removed with cluster
- No Orphaned Resources: All resources cleaned up on delete
- Default Pool (3 x t2d-standard-2): ~$20-33
- ML Pool (3 x n2d-highcpu-4): ~$15-25
- Persistent Disk (100GB standard): ~$4
- Cloud NAT: ~$1-5
- Total: ~$40-67/month
- Control plane: Free (single zonal cluster)
- Cloud NAT: ~$1/month (minimal with no traffic)
- Total: ~$1/month
Prices may vary by region and are subject to change. Spot instance pricing is typically 60-91% less than regular instances.
{
"uri": "at://test/12345",
"text": "This is a great day!",
"author": "test-user-67890",
"timestamp": "2025-09-30T10:30:00Z"
}{
"uri": "at://test/12345",
"text": "This is a great day!",
"author": "test-user-67890",
"timestamp": "2025-09-30T10:30:00Z",
"sentiment": {
"label": "POSITIVE",
"score": 0.9234
},
"processed_at": "2025-09-30T10:30:01Z"
}- Services not connecting to NATS: Check that NATS is healthy before dependent services start
- Port conflicts: The unified compose uses different ports (8081, 8082, 8083) to avoid conflicts
- Model download: Stream processors download AI models on first run, which may take time
- Memory usage: Sentiment analysis requires adequate memory allocation (3GB per processor)
-
Authentication Issues:
gcloud auth application-default login gcloud config set project YOUR_PROJECT_ID -
API Not Enabled:
gcloud services enable container.googleapis.com compute.googleapis.com -
Cluster not accessible: Verify private cluster configuration and Cloud NAT setup
.
├── README.md # This file
├── architecture.excalidraw.png # Architecture diagram
├── docker-compose.yml # Local development orchestration
├── gke-cluster/ # GKE cluster management
│ ├── gke-cluster.py # Cluster creation/management script
│ ├── README.md # GKE-specific documentation
│ ├── pyproject.toml # Python dependencies
│ └── helm/ # Kubernetes Helm charts
├── nats-firehose-ingest/ # Firehose data ingestion service
├── nats-stream-processor/ # Sentiment analysis processor
├── nats-stream-processor-topics/ # Topic classification processor
└── bsky-sentiment-web/ # Real-time visualization dashboard
The GKE cluster includes several security optimizations:
- Private nodes (no direct internet access to nodes)
- Cloud NAT for controlled outbound access
- Workload Identity for secure service access
- Container-Optimized OS (COS_CONTAINERD)
- Uses GKE defaults for RBAC and security policies
- Latest Kubernetes version
- Cost management enabled for usage tracking
