Serverless ML pipeline for analyzing before/after satellite imagery using AWS services. Designed as a workshop for UCLA OARC demonstrating serverless ML workflows.
- Accepts before/after satellite image pairs via S3 JSON inputs
- Uses SAM3 (Segment Anything Model 3) on SageMaker for object segmentation
- Compares images to identify changes (red = destroyed, green = survived)
- Uses Claude Opus 4.5 via Bedrock to analyze comparison images
- Produces styled markdown reports saved to S3
All settings are in cdk.json. Edit this file before deploying:
{
"context": {
"region": "us-west-2",
"bucket_name": "oarc-ws-image-processing-pipeline",
"endpoint_name": "oarc-ws-sam3-endpoint",
...
}
}Scripts in scripts/ automatically read from cdk.json — no other config needed.
# Install AWS CDK CLI (if not already installed)
npm install -g aws-cdk
# Bootstrap your AWS account for CDK (one-time setup per account/region)
cdk bootstrap aws://ACCOUNT-ID/us-west-2# Install dependencies
pip install -r requirements.txt
# Authenticate Docker to AWS Deep Learning Containers ECR registry
# 763104351884 is AWS's official account ID for Deep Learning Container images
# Required to pull the PyTorch SageMaker base image during CDK deployment
# See: https://aws.github.io/deep-learning-containers/reference/available_images/#pytorch-inference
aws ecr get-login-password --region us-west-2 | \
docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-west-2.amazonaws.com
# Deploy infrastructure (OarcWsStorageStack + OarcWsPipelineStack)
# This builds and deploys Lambda functions and SageMaker model container
cdk deploy --all
# Start SageMaker endpoint (costs $0.736/hr)
python scripts/deploy_endpoint.py create
# Trigger the pipeline and watch for results
python scripts/run_pipeline.py
# Check endpoint status, cost, and auto-shutdown timer
python scripts/deploy_endpoint.py status
# Clean up when done
python scripts/deploy_endpoint.py delete
cdk destroy --allstacks/
storage_stack.py # S3 bucket (persists across pipeline redeployments)
pipeline_stack.py # Lambda, Step Functions, EventBridge, monitoring
lambda_functions/ # Processor and endpoint monitor Lambda code (Docker)
sagemaker/sam3/ # SageMaker SAM3 model container (Docker, 3.4GB)
scripts/
config.py # Reads config from cdk.json (do not edit)
deploy_endpoint.py # Create/delete/status SageMaker endpoint
run_pipeline.py # Upload test inputs and watch for results
files/
images/ # Sample before/after satellite images
inputs/ # Sample JSON input files (1.json, 2.json, 3.json)
reports/ # Generated markdown reports (output)
diagrams/ # Architecture diagrams and documentation
S3 Upload → EventBridge → Step Functions → SageMaker (async) → Lambda → Bedrock → S3 Report
Event-driven workflow:
- User uploads JSON to S3
inputs/→ EventBridge detects ObjectCreated event - Step Functions reads JSON, creates payload, invokes SageMaker async endpoint
- SageMaker processes images with SAM3, writes comparison to S3
compared/ - SageMaker writes output to S3
async-out/→ triggers EventBridge again - Step Functions invokes Lambda processor
- Lambda reads comparison image, calls Bedrock Claude for analysis
- Lambda generates styled markdown report, saves to S3
markdown/
The SageMaker endpoint costs $0.736/hr ($530/month if left running). Three safeguards:
- Auto-shutdown after 1 hour of inactivity (CloudWatch alarm on
InvocationsProcessed) - Daily cleanup at 2 AM UTC (EventBridge schedule)
- Cost warnings displayed after every
cdk deploy
Check status anytime:
python scripts/deploy_endpoint.py status| Service | Purpose |
|---|---|
| S3 | Storage for images, model, and outputs |
| Lambda | Post-processing (Bedrock + report generation) |
| Step Functions | Workflow orchestration |
| SageMaker | ML inference (SAM3 async endpoint) |
| Bedrock | LLM analysis (Claude Opus 4.5) |
| EventBridge | S3 event routing + scheduled cleanup |
| CloudWatch | Endpoint monitoring + auto-shutdown alarm |