IaC to create and manage GKE on GCP
- Terraform
- gcloud CLI
- GCP project with billing enabled
-
Authenticate with GCP and set up your project:
./setup.sh <project_id>
Replace
<project_id>with your actual GCP project ID. -
Edit
dev.tfvarsas needed to customize your cluster parameters.
terraform initterraform plan -var-file=dev.tfvarsterraform apply -var-file=dev.tfvarsterraform destroy -var-file=dev.tfvarsTo deploy machine learning models with KServe, you need to install several components in your Kubernetes cluster:
- Knative Serving: Provides serverless deployment and scaling for model inference services.
- Istio: Acts as the networking layer for Knative, enabling advanced traffic management.
- cert-manager: Manages certificates for secure communication.
- KServe: The core framework for serving ML models on Kubernetes.
A helper script is provided to automate the installation of these components:
cd k8s/kserve
./install-kserve.shThis script will:
- Install Knative Serving CRDs and core components
- Install Istio and configure it for Knative
- Install cert-manager using Helm
- Create the
kservenamespace - Install KServe CRDs and KServe itself using Helm
You can review or modify the script at k8s/kserve/install-kserve.sh.
This repository includes a Kubernetes Job for load testing KServe model endpoints using Vegeta.
-
Deploy the Kserve sample model
k8s/kserve/sample-model/sklearn.yaml -
The load test job is defined in
k8s/kserve/perf-test.yaml. -
It uses a container running Vegeta to send POST requests to the
sklearn-irismodel endpoint deployed via KServe. -
The test parameters (duration, rate, CPUs) and request payload are configurable in the ConfigMap within the same YAML file.
-
To run the load test, apply the manifest to your cluster:
kubectl apply -f k8s/kserve/perf-test.yaml
-
The job will generate a text report summarizing the performance of the model endpoint.
-
You can modify the target endpoint or payload by editing the
cfgandpayloadsections in the ConfigMap.
- The GKE version is controlled by the
gke_version_prefixvariable indev.tfvars. - Providers are configured in
providers.tf. - Cluster and endpoint outputs are available after apply.