feat: add Netbird self-hosted VPN CDK app (control plane + routing peer) for shared [COM-143]#11
feat: add Netbird self-hosted VPN CDK app (control plane + routing peer) for shared [COM-143]#11vanguille wants to merge 5 commits into
Conversation
…er) for shared [COM-143] Self-hosted Netbird (WireGuard) for the autoguru-shared account, the planned Pritunl replacement. Two independent CDK stacks (C#) in netbird/: control plane and routing peer, each with a dedicated VPC, EIP, SSM-only role (no SSH), IMDSv2, encrypted EBS and CloudWatch auto-recovery. EC2 user-data is embedded from netbird/scripts. DNS lives in Cloudflare so the stacks only output the EIPs. Path-scoped GitHub workflow: cdk diff on PRs, manual deploy. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Entra application (client) ID, tenant ID and audience in the Netbird control-plane user-data are public OIDC identifiers, not secrets (the client secret is pulled from Secrets Manager at runtime and is never committed). gitleaks' generic-api-key rule flags them on entropy. Add a value-scoped allowlist (not path-scoped, so a real secret in the same file is still caught) and wire the self-scan to use the config. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
configure-aws-credentials, cdk diff and cdk deploy now run only when the AWS_DEPLOY_ROLE_ARN repo secret is set. Until an admin wires it, the workflow runs the dotnet build/compile check and emits a warning instead of failing on missing credentials, so the draft PR check is not misleadingly red. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@claude review this |
There was a problem hiding this comment.
Pull request overview
Adds a new AWS CDK (C#) application to provision a self-hosted Netbird (WireGuard) VPN in the autoguru-shared AWS account, covering both the control plane and a dedicated routing peer with stable EIPs, plus a path-scoped GitHub Actions workflow for diff/deploy.
Changes:
- Introduces two CDK stacks (
NetbirdControlPlaneStack,NetbirdRoutingPeerStack) provisioning VPC + EC2 + EIP + auto-recovery + least-privilege secrets access. - Adds embedded EC2 user-data scripts for control-plane bootstrap prerequisites and routing-peer Docker Compose bring-up.
- Adds a path-scoped CI workflow for
cdk diffon PRs and manualcdk deploy, plus gitleaks allowlisting for non-secret Entra identifiers.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| netbird/scripts/routing-peer-user-data.sh | Bootstraps Docker/Compose, pulls setup key from Secrets Manager, starts routing peer via Compose. |
| netbird/scripts/control-plane-user-data.sh | Bootstraps Docker/Compose and writes Entra OIDC env for manual Netbird control-plane setup via SSM. |
| netbird/README.md | Documents architecture, prerequisites, deploy flow, and post-deploy manual steps. |
| netbird/cdk/Program.cs | Defines CDK app entrypoint and pins deployment env (shared account + region). |
| netbird/cdk/NetbirdRoutingPeerStack.cs | Provisions routing-peer EC2 + VPC + EIP + alarm recovery + setup-key secret + IAM role. |
| netbird/cdk/NetbirdControlPlaneStack.cs | Provisions control-plane EC2 + VPC + EIP + alarm recovery + IAM role reading Entra secret. |
| netbird/cdk/Netbird.Cdk.csproj | Adds CDK project configuration and embeds user-data scripts as resources. |
| netbird/cdk/EmbeddedScript.cs | Helper for reading embedded user-data scripts at synth time. |
| netbird/cdk/cdk.json | Configures CDK app execution and watch settings. |
| netbird/.gitignore | Ignores CDK/.NET outputs and local env files under netbird/. |
| netbird/.gitattributes | Enforces LF endings for the netbird/ subtree (notably for user-data scripts). |
| .gitleaks.toml | Adds narrow allowlist for known non-secret Entra GUIDs used in setup. |
| .github/workflows/netbird-deploy.yml | Adds path-scoped workflow for build + optional AWS-auth + cdk diff/deploy. |
| .github/workflows/gitleaks-self-scan.yml | Wires self-scan workflow to use the repo’s .gitleaks.toml config. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Install Docker Compose plugin | ||
| mkdir -p /usr/local/lib/docker/cli-plugins | ||
| curl -fsSL https://github.com/docker/compose/releases/latest/download/docker-compose-linux-x86_64 \ | ||
| -o /usr/local/lib/docker/cli-plugins/docker-compose | ||
| chmod +x /usr/local/lib/docker/cli-plugins/docker-compose |
| # Install Docker Compose plugin | ||
| mkdir -p /usr/local/lib/docker/cli-plugins | ||
| curl -fsSL https://github.com/docker/compose/releases/latest/download/docker-compose-linux-x86_64 \ | ||
| -o /usr/local/lib/docker/cli-plugins/docker-compose | ||
| chmod +x /usr/local/lib/docker/cli-plugins/docker-compose |
| services: | ||
| netbird: | ||
| image: netbirdio/netbird:latest | ||
| container_name: netbird-routing-peer |
| sg.AddIngressRule(Peer.AnyIpv4(), Port.Tcp(443), "HTTPS -- management API and dashboard"); | ||
| sg.AddIngressRule(Peer.AnyIpv4(), Port.Tcp(80), "Lets Encrypt ACME HTTP challenge"); | ||
| sg.AddIngressRule(Peer.AnyIpv4(), Port.Tcp(33073), "Management gRPC -- peer client connections"); |
…escription note) - cdk.json: run the CDK app with -c Release so synth matches the CI build instead of an extra Debug compile. - control-plane-user-data.sh: add the same pre-cutover pin/checksum TODO the routing peer already carries for the Docker Compose plugin. - NetbirdControlPlaneStack.cs: document that the SG rule description omits the apostrophe in "Lets Encrypt" deliberately, because AWS rejects apostrophes in security-group rule descriptions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
VPC reachability check: can a Netbird VPN user reach internal RDS (e.g. dev SQL Server)?Following up on the new-VPC review — I verified this against the live AWS environment (ap-southeast-2), because it determines whether Netbird can actually replace Pritunl for internal access. Short answer: no. As designed, a VPN user routed through the Netbird VPC cannot reach the dev account's RDS SQL Server. There are three independent barriers, each sufficient on its own. 1. No network routeBoth Netbird VPC route tables contain only: No route to the dev VPC 2. No VPC peering / transit gatewayThe Netbird VPCs are in zero peering connections, and there's no transit gateway in the estate. Pritunl works only because of peering the new VPC doesn't inherit:
Pritunl lives inside the shared 3. The RDS security group would block it anywayThe dev RDS SG ( The Netbird VPC CIDR ( Design-intent note: the routing peer is built to route Side finding: deployment account driftThe two Netbird What it would take to reach internal RDS over NetbirdIf private internal access (e.g. SSMS → dev SQL over the VPN — what Pritunl serves today) is a goal, the isolated-VPC design needs all three of:
Cleaner alternative — the pattern the platform already proves: run the routing peer inside the shared Key question to settle before merge: if Netbird is only ever meant to front Cloudflare-protected public apps, the isolated VPC is fine. The moment it's expected to replace Pritunl for private/internal access (RDS, Redis, internal hosts), this VPC topology can't do it without the peering + CIDR + SG work above. (Verified via AWS API in the dev account: route tables |
Follow-up: VPC placement (dedicated vs shared) + security-group minimalityExpanding on the reachability comment above, with AWS guidance, a live security-group review, and a recommendation. TL;DR: the dedicated multi-VPC design is not wrong — it's the more-isolated model and it's the right call under circumstances spelled out below. For this deployment, though, placing Netbird in the existing shared-services VPC is the cleaner fit and still meets best practice. Either way the security groups themselves are tight. The dedicated multi-VPC design is legitimate, and has real security benefitsTo be clear up front: isolating the VPN in its own VPC (and splitting control plane vs routing peer into two) is a sound, defensible design. It is the stronger-isolation model and is the right choice when:
The control-plane / routing-peer split into separate VPCs is itself a genuine merit: the control plane never needs a route to any internal network, only the routing peer bridges inward — so isolating them lets each get the minimum. Keep that split regardless of the placement decision. Why the shared-services VPC is nonetheless the better fit here
Given that, the shared VPC is the better pragmatic home because:
Security-group minimality (live-verified in the deployed account)Routing peer Control plane
Optional tightenings (minor, non-blocking): set Coturn RDS Recommendation
The security groups are tight enough that none of this is a loose-rule finding; the decision is purely about where the VPN's trust boundary should sit, and both answers are defensible. (SGs verified live via AWS API: |
…Drata + Pritunl parity) Addresses the PR review (Anthony) and the Drata controls the dev POC tripped: - Place both stacks in the shared-services VPC public-subnet tier via Vpc.FromLookup (vpc-064a7525a3bcc4667) instead of a dedicated VPC, matching the Pritunl VPN. This reuses the vetted peering + RDS allowlist fabric (so developers can reach SQL Server RDS), inherits the VPC's flow logs, and keeps the control-plane/routing-peer split. - AssociatePublicIpAddress on both instances: the shared public subnets do not auto-assign, and user-data needs egress before the EIP associates. - CPU utilization alarm -> shared Slack topic on both instances (Drata 'Infrastructure Instance CPU Monitored'). - backup=true tag on the stateful control plane (shared AWS Backup plan), matching the platform VPN/RDS convention. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
What
Adds the self-hosted Netbird (WireGuard) VPN as Infrastructure as Code for the autoguru-shared account (791686214595, ap-southeast-2). This is the planned replacement for the Pritunl VPN; the two run in parallel until the routing-peer egress gate (COM-145) passes, so nothing here touches Pritunl.
Per ADR-002 and the Hybrid-ZTNA-Netbird Business Case.
Structure
AWS CDK in C# (matching the rest of our CDK infra), under
netbird/. Two independent stacks (no "God" stack):NetbirdControlPlaneStackNetbirdRoutingPeerStackEach: dedicated VPC (public subnet + EIP, no NAT), Amazon Linux 2023 + Docker, IMDSv2 required, encrypted EBS, CloudWatch auto-recovery, an SSM-only IAM role (no inbound SSH) scoped to exactly the secret it needs. EC2 user-data lives in
netbird/scripts/*.shand is embedded into the assembly.DNS (
netbird.autoguru.com.au) and the routing-peer origin allowlist are managed in Cloudflare, so the stacks only output the Elastic IPs.CI
.github/workflows/netbird-deploy.ymlis path-scoped tonetbird/**: PRs runcdk diff(read-only), deploys are a manualworkflow_dispatch(never implicit on merge). Requires repo secretAWS_DEPLOY_ROLE_ARN(a shared-account role assumable via OIDC).Validation
dotnet build -c Release: 0 warnings / 0 errors.cdk synthof both stacks against the real shared account: succeeds.Notes / follow-ups
AWS_DEPLOY_ROLE_ARNrepo secret.netbird/scripts/routing-peer-user-data.shpins binaries to:latestfor the POC with a TODO to pin before production cutover.cdk bootstrapis needed.🤖 Generated with Claude Code