Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
a51a99f
Enable HA cluster with kube-vip
sghosh23 Nov 20, 2025
8d7eb8d
Add changelog
sghosh23 Nov 20, 2025
db35ec3
clean up and fix doc
sghosh23 Nov 20, 2025
e7ec736
update doc
sghosh23 Nov 20, 2025
7f9ef2d
keep inventory doc simple
sghosh23 Nov 20, 2025
ae5e636
Add necessary vars to and image to deploy HA k8s cluster on CI based …
sghosh23 Nov 20, 2025
c019073
Deploy kube-vip when the cluster is already up
sghosh23 Nov 21, 2025
23f14be
Disable kube-vip for CI deployment
sghosh23 Nov 21, 2025
ebce003
Try fix CI deployment
sghosh23 Nov 21, 2025
3bc2c20
fix ther bootstrapping by provding empty dic fo loadbalancer_apiserver
sghosh23 Nov 21, 2025
6724e56
Use two phase approach
sghosh23 Nov 23, 2025
a8eefba
fix k8s_cluster inventroy path logic
sghosh23 Nov 24, 2025
25fecf9
remove fixed interface name
sghosh23 Nov 24, 2025
16fe249
try with availability check
sghosh23 Nov 24, 2025
3a5518c
try with fixed alias IP
sghosh23 Nov 25, 2025
ec4c1a0
drop alias
sghosh23 Nov 25, 2025
ff1c347
Add the right interface
sghosh23 Nov 25, 2025
94c3b51
try alias_ip with dterministic ips for the servers
sghosh23 Nov 25, 2025
909533d
fix variable precedence issue and add extended timeout on leader sele…
sghosh23 Nov 26, 2025
56bea84
clean up checks which was blocking the deployment
sghosh23 Nov 26, 2025
61484ff
Route the VIP address to the kubenode from adminhost
sghosh23 Nov 27, 2025
bea385f
Disable kube-vip for CI deployment
sghosh23 Nov 28, 2025
59d71ac
Fix sonarCloud complains
sghosh23 Nov 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions ansible/inventory/offline/99-static
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,10 @@ postgresql_network_interface = enp1s0
kube-master
kube-node

# Kubernetes cluster configuration (kube-vip, vip_address, vip_interface)
# is defined in group_vars/k8s-cluster/k8s-cluster.yml
# See offline/kube-vip-ha-setup.md for HA setup documentation

# Add all cassandra nodes here
[cassandra]
# cassandra1
Expand Down
65 changes: 65 additions & 0 deletions ansible/inventory/offline/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Offline Inventory Configuration

Ansible inventory for offline/air-gapped deployments of Wire infrastructure.

## Directory Structure

```
offline/
├── 99-static # Main inventory file
├── group_vars/
│ ├── all/offline.yml # Base settings (k8s version, etc.)
│ ├── k8s-cluster/k8s-cluster.yml # kube-vip HA configuration
│ ├── postgresql/postgresql.yml # PostgreSQL settings
│ └── demo/offline.yml # Demo overrides
└── artifacts/ # Generated (kubeconfig, etc.)
```

## Configuration Files

| File | Purpose |
|------|---------|
| `99-static` | Define hosts and group memberships |
| `group_vars/all/offline.yml` | Base settings (k8s version, container runtime) |
| `group_vars/k8s-cluster/k8s-cluster.yml` | kube-vip HA, API server, networking |
| `group_vars/postgresql/postgresql.yml` | PostgreSQL configuration |

## Key Variables to Customize

**In `99-static`:**
- Host IP addresses (`ansible_host` and `ip`)
- Node assignments to groups (`[kube-master]`, `[kube-node]`, `[etcd]`)

**In `group_vars/k8s-cluster/k8s-cluster.yml`:**
- `kube_vip_address` - Virtual IP for HA (e.g., `192.168.122.100`)
- `kube_vip_interface` - Network interface (e.g., `enp1s0`)

**In `group_vars/all/offline.yml`:**
- `kube_version` - Kubernetes version
- Network settings (usually defaults are fine)


## Documentation

- **Kubespray**: https://github.com/kubernetes-sigs/kubespray
- **Wire Docs**: https://docs.wire.com/

## Important Notes

- VIP must be in same subnet as control plane nodes
- VIP must not be in DHCP range
- etcd requires odd number of members (3, 5, 7)
- Keep `artifacts/` directory secure (contains admin kubeconfig)
- For production, encrypt sensitive files with SOPS

## Troubleshooting

**Inventory not found:**
```bash
ansible-inventory -i ansible/inventory/offline --list
```

**Can't SSH to nodes:**
```bash
ansible -i ansible/inventory/offline/hosts.ini all -m ping
```
103 changes: 103 additions & 0 deletions ansible/inventory/offline/group_vars/k8s-cluster/k8s-cluster.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
# Kubernetes cluster configuration for offline deployment
#
# This file contains configuration overrides for the Kubernetes cluster
# deployed via Kubespray. These settings override the defaults in
# ansible/roles-external/kubespray/roles/kubespray-defaults/defaults/main/

# ==============================================================================
# kube-vip Configuration for High Availability Control Plane
# ==============================================================================
#
# kube-vip provides a Virtual IP (VIP) for the Kubernetes API server,
# enabling automatic failover between control plane nodes without requiring
# external load balancers. Perfect for bare-metal and air-gapped deployments.
#
# Reference: https://kube-vip.io/

# Enable kube-vip (optional - set to true if you need HA control plane)
kube_vip_enabled: false

# Enable control plane VIP (required for HA)
kube_vip_controlplane_enabled: true

# Virtual IP address for the Kubernetes API server
# IMPORTANT: This must be:
# - In the same subnet as your control plane nodes
# - Unused and not in DHCP range
# - Accessible from all nodes and external clients
#
# Set the appropriate VIP address for your environment
# Example: If control plane nodes are 192.168.122.21-23, use 192.168.122.100
kube_vip_address: "192.168.122.100"

# Network interface to bind the VIP to
# Find this by running: ssh kubenode1 "ip -br addr show"
#
# For Hetzner Cloud: Use "enp7s0" (private network interface)
# For other environments: Check with "ip -br addr show"
# Common values: eth0, enp1s0, enp7s0
kube_vip_interface: "enp1s0"

# Use ARP for Layer 2 VIP management (recommended for most deployments)
# Set to false only if using BGP for Layer 3 routing
kube_vip_arp_enabled: true

# Enable kube-vip for LoadBalancer services (optional)
# Set to true if you want kube-vip to also handle LoadBalancer service IPs
# For control plane HA only, keep this false
kube_vip_services_enabled: false

# Required for kube-vip with ARP mode
# Prevents kube-proxy from responding to ARP requests for the VIP
kube_proxy_strict_arp: true

# Leader election timing (fix for kube-vip GitHub issue #453)
# Increased timeouts prevent "context deadline exceeded" errors during lease acquisition
# Default values are too aggressive for cloud environments with slower etcd/API responses
# These settings are particularly important for Hetzner Cloud and similar providers
kube_vip_leader_election_enabled: true
kube_vip_leaseduration: 30 # seconds (default: 15)
kube_vip_renewdeadline: 20 # seconds (default: 10)
kube_vip_retryperiod: 4 # seconds (default: 2)

# ==============================================================================
# Bootstrap Strategy for kube-vip HA
# ==============================================================================
#
# IMPORTANT: The following configurations are COMMENTED OUT to avoid bootstrap
# chicken-and-egg problem during automated cluster deployment.
#
# For NEW cluster deployment via bin/offline-cluster.sh:
# - These remain commented out
# - Phase 1 bootstraps without loadbalancer_apiserver (kubeadm uses node IP)
# - Phase 2 passes loadbalancer_apiserver dynamically via -e flag
#
# For MANUAL kube-vip setup on EXISTING cluster:
# - Uncomment the sections below
# - Update the IP addresses to match your VIP
# - Run: ansible-playbook -i inventory/offline/hosts.ini kubernetes.yml --tags=node,kube-vip,master,client
#
# See: offline/kube-vip-ha-setup.md for detailed documentation
#
# Reference: kubespray's test approach in
# ansible/roles-external/kubespray/tests/files/packet_centos7-flannel-addons-ha.yml

# API server advertise address (use VIP for consistency)
# This is the address the API server advertises to clients
# apiserver_loadbalancer_domain_name: "192.168.122.100"

# Configure API server endpoint to use VIP
# This tells all Kubernetes components to connect via the VIP
# loadbalancer_apiserver:
# address: "192.168.122.100"
# port: 6443

# Disable localhost load balancer since we have VIP
# When using kube-vip, we don't need the nginx localhost proxy
# loadbalancer_apiserver_localhost: false

# Add VIP to API server SSL certificates
# This ensures the API server certificate is valid for the VIP address
# supplementary_addresses_in_ssl_keys:
# - "192.168.122.100"
89 changes: 70 additions & 19 deletions bin/offline-cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,18 @@ set -x

ls $ANSIBLE_DIR/inventory/offline

if [ -f "$ANSIBLE_DIR/inventory/offline/hosts.ini" ]; then
if [[ -f "$ANSIBLE_DIR/inventory/offline/hosts.ini" ]]; then
INVENTORY_FILE="$ANSIBLE_DIR/inventory/offline/hosts.ini"
elif [ -f "$ANSIBLE_DIR/inventory/offline/inventory.yml" ]; then
elif [[ -f "$ANSIBLE_DIR/inventory/offline/inventory.yml" ]]; then
INVENTORY_FILE="$ANSIBLE_DIR/inventory/offline/inventory.yml"
else
echo "No inventory file in ansible/inventory/offline/. Please supply an $ANSIBLE_DIR/inventory/offline/inventory.yml or $ANSIBLE_DIR/inventory/offline/hosts.ini"
exit -1
echo "No inventory file in ansible/inventory/offline/. Please supply an $ANSIBLE_DIR/inventory/offline/inventory.yml or $ANSIBLE_DIR/inventory/offline/hosts.ini" >&2
exit 1
fi

if [ -f "$ANSIBLE_DIR/inventory/offline/hosts.ini" ] && [ -f "$ANSIBLE_DIR/inventory/offline/inventory.yml" ]; then
echo "Both hosts.ini and inventory.yml provided in ansible/inventory/offline! Pick only one."
exit -1
if [[ -f "$ANSIBLE_DIR/inventory/offline/hosts.ini" ]] && [[ -f "$ANSIBLE_DIR/inventory/offline/inventory.yml" ]]; then
echo "Both hosts.ini and inventory.yml provided in ansible/inventory/offline! Pick only one." >&2
exit 1
fi

echo "using ansible inventory: $INVENTORY_FILE"
Expand All @@ -31,30 +31,81 @@ echo "using ansible inventory: $INVENTORY_FILE"
# other hosts to fetch debs from it.
#
# If this step fails partway, and you know that parts of it completed, the `--skip-tags debs,binaries,containers,containers-helm,containers-other` tags may come in handy.
ansible-playbook -i $INVENTORY_FILE $ANSIBLE_DIR/setup-offline-sources.yml
ansible-playbook -i "$INVENTORY_FILE" "$ANSIBLE_DIR/setup-offline-sources.yml"

# Run kubespray until docker is installed and runs. This allows us to preseed the docker containers that
# are part of the offline bundle
ansible-playbook -i $INVENTORY_FILE $ANSIBLE_DIR/kubernetes.yml --tags bastion,bootstrap-os,preinstall,container-engine
ansible-playbook -i "$INVENTORY_FILE" "$ANSIBLE_DIR/kubernetes.yml" --tags bastion,bootstrap-os,preinstall,container-engine

# With ctr being installed on all nodes that need it, seed all container images:
ansible-playbook -i $INVENTORY_FILE $ANSIBLE_DIR/seed-offline-containerd.yml
ansible-playbook -i "$INVENTORY_FILE" "$ANSIBLE_DIR/seed-offline-containerd.yml"

# Install NTP
ansible-playbook -i $INVENTORY_FILE $ANSIBLE_DIR/sync_time.yml -v
ansible-playbook -i "$INVENTORY_FILE" "$ANSIBLE_DIR/sync_time.yml" -v

# Run the rest of kubespray. This should bootstrap a kubernetes cluster successfully:
ansible-playbook -i $INVENTORY_FILE $ANSIBLE_DIR/kubernetes.yml --skip-tags bootstrap-os,preinstall,container-engine,multus
# Phase 1: Bootstrap WITHOUT loadbalancer_apiserver so kubeadm uses node IP
# We skip kube-vip to avoid race condition with VIP
ansible-playbook -i "$INVENTORY_FILE" "$ANSIBLE_DIR/kubernetes.yml" \
--skip-tags bootstrap-os,preinstall,container-engine,multus,kube-vip

# Phase 2: Configure API endpoint (kube-vip or direct node IP)
# Check if kube-vip is enabled in the inventory
INVENTORY_DIR="$(dirname "$INVENTORY_FILE")"

# Check kube_vip_enabled flag from inventory
KUBE_VIP_ENABLED=$(yq eval '.k8s-cluster.vars.kube_vip_enabled // ""' "$INVENTORY_FILE" 2>/dev/null || echo "")
if [[ -z "$KUBE_VIP_ENABLED" ]] || [[ "$KUBE_VIP_ENABLED" = "null" ]]; then
GROUP_VARS_FILE="$INVENTORY_DIR/group_vars/k8s-cluster/k8s-cluster.yml"
if [[ -f "$GROUP_VARS_FILE" ]]; then
KUBE_VIP_ENABLED=$(yq eval '.kube_vip_enabled // ""' "$GROUP_VARS_FILE" 2>/dev/null || echo "")
fi
fi

if [[ "$KUBE_VIP_ENABLED" = "true" ]]; then
# ===== kube-vip HA Mode =====
# Extract VIP address from inventory
VIP_ADDRESS=$(yq eval '.k8s-cluster.vars.kube_vip_address // ""' "$INVENTORY_FILE" 2>/dev/null || echo "")
if [[ -z "$VIP_ADDRESS" ]] || [[ "$VIP_ADDRESS" = "null" ]]; then
GROUP_VARS_FILE="$INVENTORY_DIR/group_vars/k8s-cluster/k8s-cluster.yml"
if [[ -f "$GROUP_VARS_FILE" ]]; then
VIP_ADDRESS=$(yq eval '.kube_vip_address // ""' "$GROUP_VARS_FILE" 2>/dev/null || echo "")
fi
fi

if [[ -n "$VIP_ADDRESS" ]] && [[ "$VIP_ADDRESS" != "null" ]]; then
echo "Deploying kube-vip with VIP: $VIP_ADDRESS"
ansible-playbook -i "$INVENTORY_FILE" "$ANSIBLE_DIR/kubernetes.yml" \
--tags kube-vip,client \
-e "{\"loadbalancer_apiserver\": {\"address\": \"$VIP_ADDRESS\", \"port\": 6443}}" \
-e "apiserver_loadbalancer_domain_name=$VIP_ADDRESS" \
-e "loadbalancer_apiserver_localhost=false" \
-e "{\"supplementary_addresses_in_ssl_keys\": [\"$VIP_ADDRESS\"]}"

export KUBECONFIG="$ANSIBLE_DIR/inventory/offline/artifacts/admin.conf"
echo "✓ kube-vip deployed with VIP: $VIP_ADDRESS"
else
echo "ERROR: kube_vip_enabled=true but no VIP address found in inventory!" >&2
exit 1
fi
else
# ===== Direct Node IP Mode (No kube-vip) =====
# Phase 1 already configured kubeconfig with first node IP during cluster bootstrap
# No additional configuration needed
echo "kube-vip disabled, using direct node IP from cluster bootstrap"
export KUBECONFIG="$ANSIBLE_DIR/inventory/offline/artifacts/admin.conf"
echo "✓ Kubernetes API endpoint configured during Phase 1 bootstrap"
fi

# Deploy all other services which don't run in kubernetes.
ansible-playbook -i $INVENTORY_FILE $ANSIBLE_DIR/cassandra.yml
ansible-playbook -i $INVENTORY_FILE $ANSIBLE_DIR/elasticsearch.yml
ansible-playbook -i $INVENTORY_FILE $ANSIBLE_DIR/minio.yml
ansible-playbook -i $INVENTORY_FILE $ANSIBLE_DIR/postgresql-deploy.yml
ansible-playbook -i "$INVENTORY_FILE" "$ANSIBLE_DIR/cassandra.yml"
ansible-playbook -i "$INVENTORY_FILE" "$ANSIBLE_DIR/elasticsearch.yml"
ansible-playbook -i "$INVENTORY_FILE" "$ANSIBLE_DIR/minio.yml"
ansible-playbook -i "$INVENTORY_FILE" "$ANSIBLE_DIR/postgresql-deploy.yml"

# Uncomment to deploy external RabbitMQ (temporarily commented out until implemented in CD), PS. remote --skip-tags=rabbitmq-external from the next section
#ansible-playbook -i $INVENTORY_FILE $ANSIBLE_DIR/roles/rabbitmq-cluster/tasks/configure_dns.yml
#ansible-playbook -i $INVENTORY_FILE $ANSIBLE_DIR/rabbitmq.yml
#ansible-playbook -i "$INVENTORY_FILE" "$ANSIBLE_DIR/roles/rabbitmq-cluster/tasks/configure_dns.yml"
#ansible-playbook -i "$INVENTORY_FILE" "$ANSIBLE_DIR/rabbitmq.yml"

# create helm values that tell our helm charts what the IP addresses of cassandra, elasticsearch and minio are:
ansible-playbook -i $INVENTORY_FILE $ANSIBLE_DIR/helm_external.yml --skip-tags=rabbitmq-external
ansible-playbook -i "$INVENTORY_FILE" "$ANSIBLE_DIR/helm_external.yml" --skip-tags=rabbitmq-external
1 change: 1 addition & 0 deletions changelog.d/3-deploy-builds/enable-ha-k8s-cluster
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Added: kube-vip v0.8.0 for high-availability Kubernetes control plane with automatic failover, including comprehensive documentation and offline build configuration (enabled in CI to validate production deployment path)
Loading