diff --git a/.gitignore b/.gitignore index 485dee6..1dfe0bc 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,7 @@ .idea +.env +aurora-*-openrc.sh +deploy-production.sh +incident-snapshots/ +scripts/diagnose-volumes.sh +scripts/safe-delete-volumes.shopenstack/values-prod.local.yaml diff --git a/README.md b/README.md index 3192da7..b39d660 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,88 @@ Note that this repository has both information for small and large workloads in - [helm client](https://helm.sh/docs/intro/install/) (v3.12.0 or higher) - [kubectl client](https://kubernetes.io/docs/tasks/tools/install-kubectl/) (v1.27.0 or higher) +## Configuration Setup + +Before installing the chart, either use the tracked baseline `openstudio-server/values.yaml` or create your own values file from one of the provided templates: + +- `values_small.templateyaml` - For small workloads and testing +- `values_large.templateyaml` - For large-scale production workloads +- `values_production.templateyaml` - For general production deployments (specifically openstack) + +**Copy the appropriate template and customize it for your environment:** + +```bash +cp openstudio-server/values_small.templateyaml openstudio-server/values.yaml +# OR +cp openstudio-server/values_large.templateyaml openstudio-server/values.yaml +# OR +cp openstudio-server/values_production.templateyaml openstudio-server/values.yaml +``` + +Then edit your chosen values file (for example `openstudio-server/values.yaml`) to: +- Set your cloud provider in `global.provider.name` (`google`, `aws`, `azure`, or `openstack`). This is required. +- Configure your app secret source: + - Primary path: set `secrets.existingSecret` and keep `secrets.create=false` + - Alternate path: set `secrets.create=true` and provide `db.username`, `db.password`, `redis.password`, and `web.secret_key_value` +- If Redis credentials include URI-reserved characters, set an explicit `redis.url` override (for example `redis://:encoded-password@queue:6379`). +- Adjust resource allocations for your workload +- Configure storage sizes + +`provider.name` is deprecated and disabled by default. Any values file that still sets `provider.name` should be migrated to `global.provider.name`. +For temporary migration-only compatibility, you can opt in with: + +```yaml +global: + provider: + allowLegacyName: true +``` + +This legacy fallback is intended for staged upgrades only. + +Provider-aware scheduling defaults are automatic and based on `global.provider.name`: + +Provider | Label Key | Web Node Group | Worker Node Group +---------|-----------|----------------|------------------ +openstack | `capi.stackhpc.com/node-group` | `web` | `worker` +aws/google/azure (default) | `nodegroup` | `web-group` | `worker-group` + +Provider-aware infrastructure defaults are also automatic when values are omitted: + +Setting | openstack default | aws/google/azure default +--------|-------------------|------------------------ +`db.persistence.storageClass` | `nfs` | `ssd` +`redis.persistence.storageClass` | `nfs` | `ssd` +`load_balancer.externalTrafficPolicy` | `Cluster` | `Local` + +For OpenStack production deployments, `values_production.templateyaml` explicitly sets: + +- `db.persistence.storageClass: csi-cinder` +- `redis.persistence.storageClass: csi-cinder` + +This keeps MongoDB/Redis off the shared NFS assets volume used by worker outputs. + +NFS mount options are intentionally conservative by default in template files: + +- Default: `mountOptions: ["vers=4"]` +- Optional tuning (environment dependent): `sync`, `rsize=...`, `wsize=...` + +These options are not cloud-provider features; compatibility depends on the Kubernetes node OS/kernel NFS client and the backing NFS server behavior. + +If your cluster uses different label names, set overrides in `global.nodeGroups`: + +```yaml +global: + provider: + name: "openstack" + nodeGroups: + labelKey: "" + web: "" + worker: "" + affinityMode: "preferred" # required | preferred | disabled +``` + +**Note:** `openstudio-server/values.yaml` is a tracked baseline for reproducible defaults. Put environment-specific or sensitive overrides in a separate local file (for example `openstudio-server/values.local.yaml`) and pass it with `-f`. + ## Installing the Chart To install the helm chart with the chart name `openstudio-server`, you can run the following command in the root directory of this repo. This assumes you already have a Kubernetes cluster up and running. If you do not, please refer to [google](/google/README.md) or [aws](/aws/README.md) in this repo. @@ -22,71 +104,179 @@ To install the helm chart with the chart name `openstudio-server`, you can run t ### For Google ```bash -helm install openstudio-server ./openstudio-server --set provider.name=google +kubectl -n openstudio-server create secret generic openstudio-app-secrets \ + --from-literal=db-username="openstudio" \ + --from-literal=db-password="replace-with-strong-password" \ + --from-literal=redis-password="replace-with-strong-password" \ + --from-literal=web-secret-key="replace-with-long-random-secret" +helm upgrade --install openstudio-server ./openstudio-server \ + --namespace openstudio-server --create-namespace \ + --set global.provider.name=google ``` ### For Amazon ```bash -helm install openstudio-server ./openstudio-server --set provider.name=aws +kubectl -n openstudio-server create secret generic openstudio-app-secrets \ + --from-literal=db-username="openstudio" \ + --from-literal=db-password="replace-with-strong-password" \ + --from-literal=redis-password="replace-with-strong-password" \ + --from-literal=web-secret-key="replace-with-long-random-secret" +helm upgrade --install openstudio-server ./openstudio-server \ + --namespace openstudio-server --create-namespace \ + --set global.provider.name=aws ``` ### For Azure ```bash -helm install openstudio-server ./openstudio-server --set provider.name=azure +kubectl -n openstudio-server create secret generic openstudio-app-secrets \ + --from-literal=db-username="openstudio" \ + --from-literal=db-password="replace-with-strong-password" \ + --from-literal=redis-password="replace-with-strong-password" \ + --from-literal=web-secret-key="replace-with-long-random-secret" +helm upgrade --install openstudio-server ./openstudio-server \ + --namespace openstudio-server --create-namespace \ + --set global.provider.name=azure ``` -## Uninstalling the Chart +### For OpenStack -To uninstall/delete the `openstudio-server` helm chart: +Use an existing OpenStack-managed Kubernetes cluster when possible (for example, a cluster created through Azimuth or provided by your OpenStack administrators). This is the recommended path. + +The `openstack/` directory in this repository contains legacy self-managed cluster automation (Terraform/OpenTofu + Kubespray). That path is not actively tested and may not work in all environments; use it at your own risk. + +Once your Kubernetes cluster is available and your kubeconfig is configured, install the Helm chart: ```bash -helm uninstall openstudio-server +kubectl -n openstudio-server create secret generic openstudio-app-secrets \ + --from-literal=db-username="openstudio" \ + --from-literal=db-password="replace-with-strong-password" \ + --from-literal=redis-password="replace-with-strong-password" \ + --from-literal=web-secret-key="replace-with-long-random-secret" +helm upgrade --install openstudio-server ./openstudio-server \ + --namespace openstudio-server --create-namespace \ + --set global.provider.name=openstack ``` -The command removes all the Kubernetes components associated with the chart and deletes the release _including_ persistent volumes. See more about persistent volumes below. +`secrets.existingSecret` validation is enabled by default during install/upgrade: -## Configuration +```bash +--set secrets.validateExistingSecret=true +``` -The following table lists the configurable parameters of the OpenStudio-server chart and their default values. You can override any of these values by specifying each parameter using the `--set key=value[,key=value]` argument to `helm install`. For example, to change the data storage for NFS which stores the data points to 300GB you would run this install command: +For offline/render-only workflows (for example CI `helm template` jobs without cluster access), explicitly disable lookup-based validation: -### For Google +```bash +--set secrets.validateExistingSecret=false +``` + +The chart also supports chart-managed secret creation as an alternate mode: ```bash -helm install openstudio-server ./openstudio-server --set provider.name=google --set nfs-server-provisioner.persistence.size=300Gi +helm upgrade --install openstudio-server ./openstudio-server \ + --namespace openstudio-server --create-namespace \ + --set global.provider.name=google \ + --set secrets.existingSecret= \ + --set secrets.create=true \ + --set db.username=openstudio \ + --set db.password=replace-with-strong-password \ + --set redis.password=replace-with-strong-password \ + --set web.secret_key_value=replace-with-long-random-secret ``` -### For Amazon +**Note:** Instead of repeated `--set` flags, prefer an environment-specific values file and pass it with `-f`. +Use `./scripts/install-dry-run.sh` to run lint/render checks across default and OpenStack values before deployment. +For a quick install helper script, run `PROVIDER=openstack ./scripts/install.sh` (supported providers: `aws`, `google`, `azure`, `openstack`). + +`scripts/install.sh` now fails fast on secret validation by default: + +- Default mode: `SECRET_MODE=existing` and `EXISTING_SECRET_NAME=openstudio-app-secrets` +- Required behavior: if `NAMESPACE` does not exist, the script creates it before validating the secret. The existing secret must then exist in that namespace and contain non-empty keys: + - `db-username` + - `db-password` + - `redis-password` + - `web-secret-key` +- Alternate mode: set `SECRET_MODE=create` and provide `DB_USERNAME`, `DB_PASSWORD`, `REDIS_PASSWORD`, and `WEB_SECRET_KEY` + - In create mode, `scripts/install.sh` writes credentials to a temporary values file and passes it with `--values` (instead of secret-bearing `--set` flags), then removes the file on exit. + +To run secret preflight directly: ```bash -helm install openstudio-server ./openstudio-server --set provider.name=aws --set nfs-server-provisioner.persistence.size=300Gi +./scripts/validate-app-secret.sh --namespace openstudio-server --secret-name openstudio-app-secrets ``` -### For Azure +## Uninstalling the Chart + +To uninstall/delete the `openstudio-server` helm chart: ```bash -helm install openstudio-server ./openstudio-server --set provider.name=azure --set nfs-server-provisioner.persistence.size=300Gi +helm uninstall openstudio-server +``` + +The command removes all the Kubernetes components associated with the chart and deletes the release _including_ persistent volumes. See more about persistent volumes below. + +## Configuration + +The following table lists the configurable parameters of the OpenStudio-server chart and their default values. You can override any of these values in your `values.yaml` file (see Configuration Setup section above). + +For example, to change the data storage for NFS which stores the data points to 1Ti, modify the `nfs-server-provisioner.persistence.size` parameter in your `values.yaml`: + +```yaml +nfs-server-provisioner: + persistence: + size: 1Ti + +nfs_pvc: + storage: 900Gi ``` +**Sizing rule:** `nfs_pvc.storage` must stay below `nfs-server-provisioner.persistence.size` (recommended 85-95%) so dynamic NFS claim provisioning has filesystem/provisioner headroom. + Parameter | Description | Default --------- | ----------- | ------- nfs-server-provisioner.persistence.size | Size of the volume for storing the data point results | 550Gi | +nfs_pvc.storage | Shared RWX claim request consumed by web/rserve/background pods; keep below backend NFS size | 500Gi | db.persistence.size | Size of the volume for MongoDB | 200Gi | +global.provider.allowLegacyName | Temporary migration flag that permits legacy `provider.name` only when `global.provider.name` is unset | false | cluster.name | Kubernetes AWS or Google cluster name. If you change the default name you need to set this name here otherwise AWS auto-scaling will not work correctly | openstudio-server | worker_hpa.minReplicas | Worker pods that run the simulations | 2 | -worker_hpa.maxReplicas | Maximum Worker pods that run the simulations | 20 | +worker_hpa.maxReplicas | Maximum Worker pods that run the simulations | 50 | worker_hpa.targetCPUUtilizationPercentage | When aggregate CPU % of worker pods exceed threshold begin scaling. | 50 | +worker.queues | Comma-separated worker queues consumed by simulation workers. Include `requeued` to drain requeue backlog automatically. | simulations,requeued | +redis.url | Optional explicit Redis URI used for `REDIS_URL`; recommended when credentials contain URI-reserved characters | "" | +load_balancer.annotations | Optional extra annotations map applied to the LoadBalancer Service | {} | +load_balancer.sourceRanges | Optional `loadBalancerSourceRanges` list; some OpenStack Octavia providers ignore this setting | [] | web_background.replicas | Number of projects/analyses to run in parallel. __*Note__ Algorithmic runs are currently not supported to run in parallel. Keep default value of 1 for these types of analyses. | 1 | -web_background.container.image | Container to run the web background. Can use a custom image to override default | nrel/openstudio-server:3.7.0 | -web.container.image | Container to run the web front-end. Can use a custom image to override default | nrel/openstudio-server:3.7.0 | -worker.container.image | Container to run the worker. Can use a custom image to override default | nrel/openstudio-server:3.7.0 | -rserve.container.image | Container to run r server. Can use a custom image to override default | nrel/openstudio-rserve:3.7.0 | +web_background.container.startup.maxRetries | Maximum retries when `start-web-background` exits during startup (for transient DB/Redis races) | 12 | +web_background.container.startup.retryDelaySeconds | Delay between web-background startup retries | 10 | +worker.container.startup.maxRetries | Maximum retries when `start-workers` exits during startup (for transient DB/Redis races) | 12 | +worker.container.startup.retryDelaySeconds | Delay between worker startup retries | 10 | +worker.container.preStop.enabled | Enables worker graceful drain preStop hook | true | +worker.container.preStop.signal | Signal sent to resque processes during preStop drain | "3" | +worker.container.preStop.pollIntervalSeconds | Polling interval while waiting for ruby/openstudio process drain | 30 | +worker.container.preStop.maxWaitSeconds | Upper bound for worker preStop wait loop before allowing termination | 5100 | +global.images.org | Docker image organization/registry namespace for OpenStudio images | nrel | +global.images.serverRepository | Repository name used by web, web-background, and worker containers | openstudio-server | +global.images.rserveRepository | Repository name used by rserve container | openstudio-rserve | +global.images.tag | Shared image tag used for both server and rserve repositories | 3.10.0 | +web_background.container.image | Optional explicit override for web-background image. If omitted, chart uses global.images.* defaults | (derived) | +web.container.image | Optional explicit override for web image. If omitted, chart uses global.images.* defaults | (derived) | +worker.container.image | Optional explicit override for worker image. If omitted, chart uses global.images.* defaults | (derived) | +rserve.container.image | Optional explicit override for rserve image. If omitted, chart uses global.images.* defaults | (derived) | + +**Note:** For best practices, create your own `values.yaml` from one of the template files rather than modifying configuration via `--set` flags. See the Configuration Setup section above. #### For Large Workloads -Copy the text from inside the [large template values file](/openstudio-server/values_large.templateyaml)] and paste it inside of the [values file](/openstudio-server/values.yaml). Do this before using the `helm install ...` command. +Use the [large template values file](/openstudio-server/values_large.templateyaml) as your starting point: + +```bash +cp openstudio-server/values_large.templateyaml openstudio-server/values.yaml +``` -Additionally, note that with large workloads you may have issues with downloading container images from Docker Hub if you have a lot of worker nodes. Therefore, you may want to upload the container images into the cloud's container registry and then update the container image path in the [values file](/openstudio-server/values.yaml). This [article](https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html#:~:text=Identify%20the%20local%20image%20to,container%20images%20on%20your%20system.&text=You%20can%20identify%20an%20image,tag%20name%20combination%20to%20use.) has instructions on how to do this for aws' Elastic Container Registry (ECR). +Then customize as needed before running `helm upgrade --install`. + +Additionally, note that with large workloads you may have issues with downloading container images from Docker Hub if you have a lot of worker nodes. Therefore, you may want to upload the container images into the cloud's container registry and then update the container image paths in your `values.yaml` file. This [article](https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html#:~:text=Identify%20the%20local%20image%20to,container%20images%20on%20your%20system.&text=You%20can%20identify%20an%20image,tag%20name%20combination%20to%20use.) has instructions on how to do this for aws' Elastic Container Registry (ECR). ## Accessing OpenStudio Server @@ -187,8 +377,163 @@ This helm chart provisions persistent storage for the Database (MongoDB) and the While it's possible to change the storage to use `Retain` vs `Delete`, the helm chart will need to be reconfigured to allow to attach to existing volumes. This will be worked on as an enhancement for a future release. +### NFS Saturation Recovery (No-Pruning Retention) + +If OpenStudio begins returning HTTP 500 and MongoDB logs show `No space left on device`, recover in this order: + +```bash +# 1) Confirm NFS backend fullness and impacted pods +kubectl -n openstudio-server get pods +kubectl -n openstudio-server logs deploy/db --tail=100 +kubectl -n openstudio-server exec deploy/openstudio-server-nfs-server-provisioner -- df -h /export + +# 2) Verify storage classes and expansion support +kubectl -n openstudio-server get pvc nfs-pvc-data -o jsonpath='{.spec.storageClassName}{"\n"}' +kubectl get storageclass -o yaml | grep -i allowVolumeExpansion + +# 3) Expand backend volume claim used by nfs-server-provisioner (example to 1Ti) +kubectl -n openstudio-server patch pvc nfs-pvc-data \ + -p '{"spec":{"resources":{"requests":{"storage":"1Ti"}}}}' + +# 4) Wait for resize to complete, then restart impacted services +kubectl -n openstudio-server get pvc nfs-pvc-data -w +kubectl -n openstudio-server exec deploy/openstudio-server-nfs-server-provisioner -- df -h /export +kubectl -n openstudio-server rollout restart deploy/db deploy/web deploy/web-background +kubectl -n openstudio-server rollout status deploy/db +kubectl -n openstudio-server rollout status deploy/web +``` + +Notes: + +- `nfs-server-provisioner.persistence.size` controls backend capacity for all dynamic `nfs` claims. +- `nfs_pvc.storage` should be configured smaller than `nfs-server-provisioner.persistence.size` (recommended 85-95%); requesting equal size can fail provisioning due to overhead/headroom checks. +- `nfs_pvc.storage` is a request value; it is not an independent quota when backed by the same NFS server volume. +- Existing PVC `storageClassName` is immutable. If migrating DB/Redis from NFS to block storage, use a planned migration window with backup/restore. + +### Reliability Preflight, Snapshot, and Helm Reconcile Automation + +Use `scripts/openstudio-reliability` to standardize triage and recovery steps: + +```bash +# Read-only reliability checks (recommended first step) +./scripts/openstudio-reliability --mode check + +# Capture queue/job snapshots before any mutation +./scripts/openstudio-reliability --mode snapshot \ + --snapshot-dir ./incident-snapshots/openstudio-server-$(date +%Y%m%d-%H%M%S) + +# Reconcile Helm only for managed-field conflict failures +./scripts/openstudio-reliability --mode reconcile-helm --apply --allow-chart-apply + +# Recover stuck analyses (stale started jobs/datapoints; apply-gated) +./scripts/openstudio-reliability --mode recover-stuck --stale-minutes 70 --apply +``` + +Design notes: + +- Script defaults to read-only mode. +- Mutating operations require explicit `--apply`. +- Snapshot mode captures queue depths and app job status for incident auditability. + +### Helm Failed-State Reconcile Playbook (SSA Conflicts) + +If `helm status` is `failed` while workloads are healthy, and the description includes managed-field conflict errors (for example `.spec.replicas` or HPA fields), use this sequence. + +Do **not** run reconcile if: + +- workloads are unstable (crashing, unavailable, or actively recovering), +- failure reason is unknown or unrelated to managed-field conflicts, +- local chart changes are unreviewed for production. + +```bash +# 1) Confirm actual runtime health first +kubectl -n openstudio-server get pods +kubectl -n openstudio-server get deploy worker +kubectl -n openstudio-server get hpa worker -o wide + +# 2) Confirm release failure reason +helm status openstudio-server -n openstudio-server +helm history openstudio-server -n openstudio-server + +# 3) Reconcile using guarded helper (includes dry-run preflight) +./scripts/openstudio-reliability --mode reconcile-helm --apply --allow-chart-apply + +# 4) Validate release and runtime gates +helm status openstudio-server -n openstudio-server +kubectl -n openstudio-server get pods +kubectl -n openstudio-server get hpa worker -o wide +kubectl -n openstudio-server exec deploy/redis -- sh -lc 'PW="${REDIS_PASSWORD:-}"; AUTH=""; [ -n "$PW" ] && AUTH="-a $PW --no-auth-warning"; redis-cli $AUTH LLEN resque:queue:simulations' + +# 5) Roll back if release or runtime regresses +helm rollback openstudio-server -n openstudio-server +``` + +### Stuck Analysis Recovery Playbook (Queue/State Divergence) + +If analyses remain in `started` while queues are empty or `requeued` backlog exists, use this sequence. + +```bash +# 1) Confirm divergence and capture evidence +./scripts/openstudio-reliability --mode check --stale-minutes 70 +./scripts/openstudio-reliability --mode snapshot \ + --stale-minutes 70 \ + --snapshot-dir ./incident-snapshots/openstudio-server-$(date +%Y%m%d-%H%M%S) + +# 2) Apply guarded recovery +./scripts/openstudio-reliability --mode recover-stuck --stale-minutes 70 --apply + +# 3) Re-check health and convergence +./scripts/openstudio-reliability --mode check --stale-minutes 70 +``` + +Guardrails: + +- Recovery is apply-gated and uses a Redis lock to prevent concurrent remediation runs. +- Recovery only mutates stale entries older than the configured threshold. +- Batch-run jobs are finalized only when all datapoints are terminal. + +### Postmortem Template and Corrective-Action Checklist + +For each production incident, capture: + +1. Trigger, impact window, and user-visible symptoms. +2. Root cause chain (technical + operational contributing factors). +3. Detection latency and which alert should have fired earlier. +4. Immediate mitigations applied and why they were chosen. +5. Permanent fixes across defaults, automation, and docs. +6. Drill plan and verification date for each corrective action. +7. Owner per action item with objective completion criteria. + +Recent stuck-state retrospective findings (used for this runbook hardening): + +- Worker defaults consumed `simulations` but not `requeued`, allowing requeued work to stall indefinitely. +- Infrastructure health (`helm status`, pod readiness) can remain green while app-level analysis state diverges. +- Reliable recovery requires both queue remediation and state convergence checks (not just Helm reconcile). + +### Alerting Baseline (Recommended) + +Minimum production alerts to add in your platform monitoring: + +Metric | Warning | Critical | Rationale +------ | ------- | -------- | --------- +NFS `/export` free space | `<20%` | `<10%` | Early detection before DB/asset write failures. +NFS fill projection (time-to-full) | `<7 days` | `<2 days` | Catch rapid growth even when free space still appears high. +Redis `resque:queue:simulations` backlog age | `>15m` | `>30m` | Detect worker throughput mismatch. +Queue/job divergence (`Job(status='queued')` with near-empty Redis queues) | `>5 queued for 10m` | `>20 queued for 10m` | Detect scheduler enqueue drift. +Worker HPA saturation (`current/target` CPU) | `>90% for 10m` | `>95% for 15m` | Detect sustained compute bottleneck. +Helm release state | `failed` | `failed for >15m` | Ensure operator metadata is reconciled quickly. + +### Reliability Drill Cadence + +Run a monthly drill that executes: + +1. `./scripts/openstudio-reliability --mode check` +2. `./scripts/openstudio-reliability --mode snapshot --snapshot-dir ` +3. Helm reconcile dry procedure review (no mutation), then controlled reconcile in non-prod. +4. Post-drill retrospective with action-item updates. + ## Auto Scaling The worker pods are configured to auto-scale based on CPU threshold (default 12%). Once the aggregate CPU for all worker pods exceed the defined threshold (in this case 12%), the Kubernetes engine will start adding additional worker pods up to the maximum specified. This is also dependent on how the Kuebernetes cluster was configured as additional VM node instances will also be added. Please refer to the notes on [aws](/aws/README.md) and [google](/google/README.md) when setting up the cluster and note the instance type and maximum nodes specified. -Once the aggregate CPU of the workers drop below 12%, the Kubernetes engine will start removing worker pod instances. There is a [prestop hook](https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/) configured in the worker pod to ensure that if a openstudio job is still active it will not terminate the pod until it is finished. +Once the aggregate CPU of the workers drop below 12%, the Kubernetes engine will start removing worker pod instances. There is a [prestop hook](https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/) configured in the worker pod to drain resque workers and wait for active ruby/openstudio processes before termination. The wait behavior is bounded and configurable via `worker.container.preStop.*`. diff --git a/aws/README.md b/aws/README.md index 24393ee..97edf2f 100644 --- a/aws/README.md +++ b/aws/README.md @@ -95,7 +95,15 @@ As of Kubernetes version 1.23, to use EBS volumes you must install an EKS Add-On ## Connecting to your cluster using kubectl -Once eksctl is done setting up the cluster, it will automatically setup the connection by creating a `~/.kube/config` file so you and can begin using helm and kubectl cli tools to communicate to the cluster. occasionally, you need to run generate this config manually. If you are not able to run `kubectl get nodes` you can re-run the kube config setup by running `aws eks update-kubeconfig --name openstudio-server` Change the `--name` to match the cluster name if different from the example. Now that the cluster is ready, you can now deploy the helm chart. Please refer the main README.md doc for deploying the helm chart. +Once eksctl is done setting up the cluster, it will automatically setup the connection by creating a `~/.kube/config` file so you and can begin using helm and kubectl cli tools to communicate to the cluster. occasionally, you need to run generate this config manually. If you are not able to run `kubectl get nodes` you can re-run the kube config setup by running `aws eks update-kubeconfig --name openstudio-server` Change the `--name` to match the cluster name if different from the example. Now that the cluster is ready, you can now deploy the helm chart. Please refer the main README.md doc for deploying the helm chart. + +After deployment, retrieve the OpenStudio Server external endpoint with: + +```bash +kubectl get svc ingress-load-balancer -n openstudio-server +``` + +Use the `EXTERNAL-IP` value in PAT under **Existing Server URL**. ## Delete the cluster using eksctl @@ -127,4 +135,3 @@ This cmd should return no clusters. You can also use the web console in your AWS If the cluster didn't get fully deleted, go to [CloudFormation](console.aws.amazon.com/cloudformation) and manually delete the cluster stack. - diff --git a/azure/README.md b/azure/README.md index 4863a00..8c8d41f 100644 --- a/azure/README.md +++ b/azure/README.md @@ -74,6 +74,14 @@ aks-nodepool1-23944537-vmss000002 Ready agent 12m v1.18.14 The cluster is now ready to deploy the helm chart. Please refer to the helm [README.md](../README.md) to deploy the openstudio-server helm chart. +After deployment, retrieve the OpenStudio Server external endpoint with: + +```bash +kubectl get svc ingress-load-balancer -n openstudio-server +``` + +Use the `EXTERNAL-IP` value in PAT under **Existing Server URL**. + ## Delete cluster When you are finished and you can simply delete the entire cluster. diff --git a/google/README.md b/google/README.md index 3009f05..7f3b410 100644 --- a/google/README.md +++ b/google/README.md @@ -243,6 +243,14 @@ NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE openstudio-server us-west1-a 1.14.10-gke.27 35.230.92.87 n1-standard-4 1.14.10-gke.27 3 RUNNING ``` +After deployment, retrieve the OpenStudio Server external endpoint with: + +```bash +kubectl get svc ingress-load-balancer -n openstudio-server +``` + +Use the `EXTERNAL-IP` value in PAT under **Existing Server URL**. + ## Delete cluster When you are finished and you can simply delete the entire cluster.