Production-grade AWS infrastructure for the workshop platform, featuring multi-account EKS clusters with Fargate, automated CI/CD pipelines, and complete environment isolation.
- Overview
- Architecture
- Prerequisites
- Bootstrap Process
- GitHub Setup
- Local Development
- CI/CD Pipelines
- Emergency Procedures
- Project Structure
This repository contains Terraform infrastructure code for deploying a production-grade Kubernetes platform on AWS. The platform uses:
- Amazon EKS with Fargate for serverless container orchestration
- Amazon ECR for container image registries with lifecycle policies
- Multi-account architecture for complete environment isolation
- Automated CI/CD with GitHub Actions
- Remote state management with S3 and DynamoDB
- Environment-specific configurations for dev, staging, and production
Each environment runs in a separate AWS account for maximum isolation and security:
Development Account (111111111111)
├─ S3 State Bucket
├─ DynamoDB Lock Table
├─ IAM Users (terraform-ci)
└─ EKS Cluster (workshop-eks-dev)
Staging Account (222222222222)
├─ S3 State Bucket
├─ DynamoDB Lock Table
├─ IAM Users (terraform-ci)
└─ EKS Cluster (workshop-eks-stg)
Production Account (333333333333)
├─ S3 State Bucket
├─ DynamoDB Lock Table
├─ IAM Users (terraform-ci)
└─ EKS Cluster (workshop-eks-prd)
✅ Complete isolation - Separate AWS accounts per environment ✅ Fargate-only - No EC2 nodes to manage ✅ Automated deployments - GitHub Actions CI/CD ✅ Cost-optimized - Dev/staging use single NAT gateway ✅ Production-ready - Multi-AZ, full logging, HA configuration
Before you begin, ensure you have:
- AWS CLI (v2.x or later) - Installation Guide
- Terraform (>= 1.0) - Installation Guide
- kubectl - Installation Guide
- Git - For version control
- GitHub Account - With repository access
You'll need access to three AWS accounts:
- Development account
- Staging account
- Production account
Note: You can start with a single account and migrate to multi-account later, but separate accounts are highly recommended for production.
Your AWS user/role needs permissions to create:
- S3 buckets
- DynamoDB tables
- IAM users, roles, and policies
- VPCs and networking resources
- EKS clusters
- CloudWatch log groups
The bootstrap process must be completed once per AWS account to set up remote state management.
Configure AWS CLI for each account:
# For development account
aws configure --profile workshop-dev
# Enter: Access Key ID, Secret Access Key, Region (eu-west-1), Output format (json)
# For staging account
aws configure --profile workshop-stg
# For production account
aws configure --profile workshop-prdAlternative (SSO Login):
aws sso login --profile workshop-dev# Switch to development account
export AWS_PROFILE=workshop-dev
# Navigate to terraform_init directory
cd terraform_init
# Initialize Terraform
terraform init
# Review the plan
terraform plan
# Expected output: S3 bucket, DynamoDB table, IAM user with policies
# Look for green '+' signs indicating resources to be created
# Apply the configuration with environment variable
terraform apply -var="environment=dev"
# Type 'yes' when prompted
# Creates bucket: workshop-ua-dev-terraform-state
# IMPORTANT: Save the outputs!
terraform output -raw ci_access_key_id > ../dev-access-key.txt
terraform output -raw ci_secret_access_key > ../dev-secret-key.txt
# The outputs contain:
# - S3 bucket name: workshop-ua-dev-terraform-state
# - DynamoDB table name for state locking
# - IAM user credentials for CI/CD# Switch to staging account
export AWS_PROFILE=workshop-stg
# Run terraform in same directory (it will create resources in new account)
terraform init -reconfigure
terraform plan -var="environment=stg"
terraform apply -var="environment=stg"
# Creates bucket: workshop-ua-stg-terraform-state
# Save outputs
terraform output -raw ci_access_key_id > ../stg-access-key.txt
terraform output -raw ci_secret_access_key > ../stg-secret-key.txt# Switch to production account
export AWS_PROFILE=workshop-prd
# Run terraform
terraform init -reconfigure
terraform plan -var="environment=prd"
terraform apply -var="environment=prd"
# Creates bucket: workshop-ua-prd-terraform-state
# Save outputs
terraform output -raw ci_access_key_id > ../prd-access-key.txt
terraform output -raw ci_secret_access_key > ../prd-secret-key.txt# Add credentials to .gitignore (already done)
# NEVER commit these files to git
# Store credentials securely:
# 1. Add them to GitHub as environment secrets (see next section)
# 2. Delete the local files after setup:
rm ../dev-access-key.txt ../stg-access-key.txt ../prd-access-key.txt
rm ../dev-secret-key.txt ../stg-secret-key.txt ../prd-secret-key.txtConfigure GitHub repository for automated CI/CD deployments.
- Navigate to your repository on GitHub
- Go to Settings → Environments
- Click New environment
- Create three environments:
devstgproduction
For each environment, add the AWS credentials:
- Go to Settings → Environments → dev
- Click Add secret
- Add two secrets:
- Name:
AWS_ACCESS_KEY_ID, Value: (from dev-access-key.txt) - Name:
AWS_SECRET_ACCESS_KEY, Value: (from dev-secret-key.txt)
- Name:
- Go to Settings → Environments → stg
- Add secrets:
- Name:
AWS_ACCESS_KEY_ID, Value: (from stg-access-key.txt) - Name:
AWS_SECRET_ACCESS_KEY, Value: (from stg-secret-key.txt)
- Name:
- Go to Settings → Environments → production
- Add secrets:
- Name:
AWS_ACCESS_KEY_ID, Value: (from prd-access-key.txt) - Name:
AWS_SECRET_ACCESS_KEY, Value: (from prd-secret-key.txt)
- Name:
For the production environment:
- Go to Settings → Environments → production
- Enable Required reviewers:
- Add team members who should approve production deployments
- Recommended: At least 2 reviewers
- Enable Wait timer (optional):
- Example: 5 minutes to allow for cancellation
- Configure Deployment branches:
- Select "Selected branches"
- Add rule:
main
- Go to Settings → Actions → General
- Ensure "Allow all actions and reusable workflows" is selected
- Under "Workflow permissions":
- Select "Read and write permissions"
- Check "Allow GitHub Actions to create and approve pull requests"
# Navigate to platform directory
cd platform
# Choose your target environment
export AWS_PROFILE=workshop-dev # or workshop-stg, workshop-prd
# Initialize Terraform with environment-specific backend
terraform init -backend-config="bucket=workshop-ua-dev-terraform-state"
# Plan changes
terraform plan -var-file="environments/dev.tfvars"
# Apply changes (use with caution!)
terraform apply -var-file="environments/dev.tfvars"Always format your code before committing:
cd platform
terraform fmt -recursive# Switch to staging
export AWS_PROFILE=workshop-stg
terraform init -backend-config="bucket=workshop-ua-stg-terraform-state" -reconfigure
terraform plan -var-file="environments/stg.tfvars"
# Switch back to dev
export AWS_PROFILE=workshop-dev
terraform init -backend-config="bucket=workshop-ua-dev-terraform-state" -reconfigure
terraform plan -var-file="environments/dev.tfvars"The platform includes an ECR module that creates container image registries for your projects. Each registry is provisioned with a lifecycle policy to manage image retention.
Add project names to the projects list in your environment tfvars file:
# platform/environments/dev.tfvars
projects = ["spring-petshop", "my-api", "my-frontend"]One ECR repository is created per project name.
Each repository enforces the following retention rules:
| Tag Pattern | Retention |
|---|---|
*RELEASE |
Kept indefinitely |
*SNAPSHOT |
Only the latest 5 images are kept |
Images with tags ending in RELEASE (e.g., v1.0.0-RELEASE) are never expired. Images with tags ending in SNAPSHOT (e.g., v1.0.0-SNAPSHOT) are automatically cleaned up, keeping only the 5 most recent.
After applying, retrieve the ECR repository URLs and ARNs:
# Get all ECR repository URLs
terraform output ecr_repository_urls
# Example output:
# {
# "spring-petshop" = "123456789012.dkr.ecr.eu-west-1.amazonaws.com/spring-petshop"
# }
# Get all ECR repository ARNs
terraform output ecr_repository_arnsTo push images to a registry:
# Authenticate Docker with ECR
aws ecr get-login-password --region eu-west-1 --profile workshop-dev | \
docker login --username AWS --password-stdin 123456789012.dkr.ecr.eu-west-1.amazonaws.com
# Tag and push an image
docker tag my-app:latest 123456789012.dkr.ecr.eu-west-1.amazonaws.com/spring-petshop:v1.0.0-RELEASE
docker push 123456789012.dkr.ecr.eu-west-1.amazonaws.com/spring-petshop:v1.0.0-RELEASEFor detailed module documentation, see platform/modules/ecr/README.md.
For each project in the projects list, the platform creates a dedicated IAM user with scoped permissions for CI/CD pipelines in separate repositories.
Each project gets:
- IAM User named
<cluster>-<project>-ci-user(e.g.,workshop-eks-dev-spring-petshop-ci-user) - Access Keys for authenticating from CI/CD pipelines
- IAM Policy with:
- ECR push permissions scoped to the project's repository
- EKS describe permissions for configuring
kubectl
- EKS Access Entry with
AmazonEKSEditPolicyscoped to a namespace matching the project name - Kubernetes Namespace matching the project name
After applying, retrieve the CI/CD credentials:
# Get all CI user ARNs
terraform output ci_user_arns
# Get access key IDs
terraform output ci_user_access_key_ids
# Get secret access keys (sensitive)
terraform output -json ci_user_secret_access_keysConfigure the access key and secret as secrets in your project's CI/CD pipeline, then:
# Authenticate with ECR
aws ecr get-login-password --region eu-west-1 | \
docker login --username AWS --password-stdin <account_id>.dkr.ecr.eu-west-1.amazonaws.com
# Push image
docker push <account_id>.dkr.ecr.eu-west-1.amazonaws.com/spring-petshop:v1.0.0-RELEASE
# Configure kubectl
aws eks update-kubeconfig --region eu-west-1 --name workshop-eks-dev
# Deploy to the project's namespace
kubectl apply -f deployment.yaml -n spring-petshopThe repository includes three automated workflows:
Workflow: .github/workflows/terraform-plan.yml
Triggers: When a PR is created or updated
Actions:
- ✅ Lint check (runs once)
- ✅ Plan for dev (parallel)
- ✅ Plan for stg (parallel)
- ✅ Plan for prd (parallel)
- 💬 Post 3 plan outputs as PR comments
Usage:
git checkout -b feature/add-namespace
# Make changes to platform/
git add .
git commit -m "Add new namespace for applications"
git push origin feature/add-namespace
# Create PR → Plans run automaticallyWorkflow: .github/workflows/terraform-apply.yml
Triggers: Push or merge to main branch
Actions:
- ✅ Deploy to dev
- ⏸️ Wait for success
- ✅ Deploy to stg
- ⏸️ Wait for success
- ✅ Deploy to prd (requires approval)
- ✅ Final summary
Sequential Flow:
main branch update
↓
Deploy Dev
↓
Success?
↓ Yes
Deploy Stg
↓
Success?
↓ Yes
Approve Prod? (manual)
↓ Yes
Deploy Prd
↓
Done!
Workflow: .github/workflows/terraform-manual.yml
Triggers: Manual (on-demand)
Inputs:
- Environment (dev, stg, prd)
- Action (plan, apply)
- Branch (any branch name)
- Reason (audit trail)
Usage:
- Go to Actions → Terraform Manual/Emergency
- Click "Run workflow"
- Select parameters
- Provide reason for audit
- Execute
See Emergency Procedures for more details.
For critical production issues that can't wait for the normal PR process:
-
Create hotfix branch:
git checkout -b hotfix/critical-issue-123 # Make minimal changes to fix the issue git commit -m "Fix critical production issue #123" git push origin hotfix/critical-issue-123
-
Use manual workflow:
- Navigate to Actions → Terraform Manual/Emergency
- Click "Run workflow"
- Environment:
prd - Action:
apply - Branch:
hotfix/critical-issue-123 - Reason: "Emergency fix for production incident #123"
-
Follow up with PR:
# After emergency is resolved, create PR for review gh pr create --base main --head hotfix/critical-issue-123
If a deployment causes issues:
-
Identify previous working commit:
git log platform/ --oneline
-
Use manual workflow:
- Environment: affected environment
- Action:
apply - Branch:
mainor commit SHA - Reason: "Rolling back due to issue #456"
Before creating a PR, test changes in dev:
# Create feature branch
git checkout -b feature/test-changes
# Make changes
# ...
# Push branch
git push origin feature/test-changes
# Use manual workflow:
# - Environment: dev
# - Action: plan (or apply)
# - Branch: feature/test-changes
# - Reason: "Testing new feature before PR"workshop-platform/
├── .github/
│ ├── workflows/
│ │ ├── terraform-plan.yml # PR validation
│ │ ├── terraform-apply.yml # Auto-deployment
│ │ ├── terraform-apply-reusable.yml # Shared logic
│ │ └── terraform-manual.yml # Emergency workflow
│ └── CI_CD_SETUP.md # Detailed CI/CD docs
│
├── terraform_init/ # Bootstrap (run once per account)
│ ├── main.tf # State bucket, DynamoDB, IAM
│ ├── variables.tf
│ ├── outputs.tf
│ └── README.md
│
├── platform/ # Main infrastructure
│ ├── environments/ # Environment configs
│ │ ├── dev.tfvars
│ │ ├── stg.tfvars
│ │ └── prd.tfvars
│ │
│ ├── modules/ # Reusable modules
│ │ └── ecr/ # ECR registry module
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── README.md
│ │
│ ├── backend.tf # Remote state config
│ ├── provider.tf # Terraform providers
│ ├── vpc.tf # VPC and networking
│ ├── security-groups.tf # Security groups
│ ├── iam.tf # IAM roles and policies
│ ├── eks.tf # EKS cluster
│ ├── ecr.tf # ECR registries
│ ├── ci_users.tf # Per-project CI/CD users
│ ├── fargate.tf # Fargate profiles
│ ├── helm-charts.tf # AWS LB Controller
│ ├── variables.tf # Input variables
│ ├── outputs.tf # Output values
│ ├── ENVIRONMENTS.md # Environment details
│ └── README.md # Platform docs
│
├── CLAUDE.md # Project guidelines
└── README.md # This file
- platform/environments/ - Environment-specific configurations
dev.tfvars- Development configurationstg.tfvars- Staging configurationprd.tfvars- Production configuration
- platform/README.md - Platform infrastructure details
- .github/CI_CD_SETUP.md - CI/CD pipeline guide
- ✅ Always run
terraform fmtbefore committing - ✅ Test changes in dev before promoting
- ✅ Use PRs for all normal changes
- ✅ Review terraform plans carefully
- ✅ Keep environment configs in sync (where applicable)
- ✅ Use descriptive commit messages
- ✅ Follow up emergency deployments with PRs
- ❌ Never commit AWS credentials
- ❌ Don't skip dev/staging when testing
- ❌ Don't manually modify production via console
- ❌ Don't force-push to main
- ❌ Don't share AWS credentials between environments
- ❌ Don't bypass the PR process except for emergencies
If you get "Error acquiring state lock":
# Check who has the lock
aws dynamodb scan --table-name terraform-state-locks --profile workshop-dev
# If lock is stuck (use with caution):
cd platform
terraform force-unlock <LOCK_ID>- Check Actions tab for detailed logs
- Verify environment secrets are set correctly
- Ensure AWS credentials are valid
- Check terraform formatting:
terraform fmt -check
# Test credentials
aws sts get-caller-identity --profile workshop-dev
# Expected output:
# {
# "UserId": "...",
# "Account": "111111111111",
# "Arn": "..."
# }# Update kubeconfig
aws eks update-kubeconfig --region eu-west-1 --name workshop-eks-dev --profile workshop-dev
# Test access
kubectl get nodes- Documentation: Check platform/README.md
- CI/CD Issues: See .github/CI_CD_SETUP.md
- Terraform Docs: terraform.io/docs
- AWS EKS Guide: docs.aws.amazon.com/eks
[Your License Here]
[Your Team/Contributors Here]