Skip to content

Commit 0edd8fc

Browse files
Add documentation on how to upgrade from scratchfs to swap (#34031)
Co-authored-by: Pranshu Maheshwari <[email protected]>
1 parent bf15621 commit 0edd8fc

File tree

7 files changed

+169
-82
lines changed

7 files changed

+169
-82
lines changed

doc/user/content/installation/install-on-aws/appendix-deployment-guidelines.md

Lines changed: 8 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -33,38 +33,19 @@ when operating on datasets larger than main memory as well as allows for a more
3333
graceful degradation rather than OOMing. Network-attached storage (like EBS
3434
volumes) can significantly degrade performance and is not supported.
3535

36-
*Starting in v0.3.1 of Materialize on AWS Terraform*, disk support (using
37-
OpenEBS and NVMe instance storage) is enabled, by default, for Materialize. With
38-
this change, the Terraform:
36+
Starting in v0.6.1 of Materialize on AWS Terraform,
37+
disk support (using swap on NVMe instance storage) may be enabled for Materialize.
38+
With this change, the Terraform:
3939

40-
- Installs OpenEBS via Helm;
41-
42-
- Configures NVMe instance store volumes using a bootstrap script;
43-
44-
- Creates appropriate storage classes for Materialize.
45-
46-
Associated with this change,
40+
- Creates a node group for Materialize.
41+
- Configures NVMe instance store volumes as swap using a daemonset.
42+
- Enables swap at the Kubelet.
4743

4844
- The following configuration options are available:
4945

50-
- [`enable_disk_support`]
51-
- [`disk_support_config`]
52-
53-
- The default [`node_group_instance_types`] has changed from `"r8g.2xlarge"` to
54-
`"r7gd.2xlarge"`. See [Recommended instance
55-
types](#recommended-instance-types).
56-
57-
[enable disk support]:
58-
https://github.com/MaterializeInc/terraform-aws-materialize?tab=readme-ov-file#disk-support-for-materialize
59-
60-
[`enable_disk_support`]:
61-
https://github.com/MaterializeInc/terraform-aws-materialize?tab=readme-ov-file#input_enable_disk_support
62-
63-
[`disk_support_config`]:
64-
https://github.com/MaterializeInc/terraform-aws-materialize?tab=readme-ov-file#input_disk_support_config
46+
- [`swap_enabled`]
6547

66-
[`node_group_instance_types`]:
67-
https://github.com/MaterializeInc/terraform-aws-materialize?tab=readme-ov-file#input_node_group_instance_types
48+
See [Upgrade Notes](https://github.com/MaterializeInc/terraform-aws-materialize?tab=readme-ov-file#v061)
6849

6950

7051
## CPU affinity

doc/user/content/installation/install-on-azure/appendix-deployment-guidelines.md

Lines changed: 8 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -47,28 +47,19 @@ when operating on datasets larger than main memory as well as allows for a more
4747
graceful degradation rather than OOMing. Network-attached storage (like EBS
4848
volumes) can significantly degrade performance and is not supported.
4949

50-
Starting in v0.4.0 of Materialize on Azure Terraform, disk support (using
51-
OpenEBS and NVMe instance storage) is enabled, by default, for Materialize. With
52-
this change, the Terraform:
50+
Starting in v0.6.1 of Materialize on Azure Terraform,
51+
disk support (using swap on NVMe instance storage) may be enabled for Materialize.
52+
With this change, the Terraform:
5353

54-
- Installs OpenEBS via Helm;
55-
56-
- Configures NVMe instance store volumes using a bootstrap script;
57-
58-
- Creates appropriate storage classes for Materialize.
59-
60-
Associated with this change:
54+
- Creates a node group for Materialize.
55+
- Configures NVMe instance store volumes as swap using a daemonset.
56+
- Enables swap at the Kubelet.
6157

6258
- The following configuration options are available:
6359

64-
- [`enable_disk_support`]
65-
- [`disk_support_config`]
66-
- [`disk_setup_image`]
60+
- [`swap_enabled`]
6761

68-
- The default
69-
[`aks_config.vm_size`](https://github.com/MaterializeInc/terraform-azurerm-materialize?tab=readme-ov-file#input_aks_config)
70-
has changed from `Standard_E8ps_v6` to `Standard_E4pds_v6`. See [Recommended
71-
instance types](#recommended-instance-types).
62+
See [Upgrade Notes](https://github.com/MaterializeInc/terraform-azurerm-materialize?tab=readme-ov-file#v061)
7263

7364
## Recommended Azure Blob Storage
7465

doc/user/content/installation/install-on-gcp/appendix-deployment-guidelines.md

Lines changed: 8 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -75,37 +75,19 @@ when operating on datasets larger than main memory as well as allows for a more
7575
graceful degradation rather than OOMing. Network-attached storage (like EBS
7676
volumes) can significantly degrade performance and is not supported.
7777

78-
Starting in v0.4.0 of Materialize on Google Cloud Provider (GCP) Terraform,
79-
disk support (using OpenEBS and NVMe instance storage) is enabled, by default,
80-
for Materialize. With this change, the Terraform:
78+
Starting in v0.6.1 of Materialize on Google Cloud PRovider (GCP) Terraform,
79+
disk support (using swap on NVMe instance storage) may be enabled for Materialize.
80+
With this change, the Terraform:
8181

82-
- Installs OpenEBS via Helm;
83-
84-
- Configures NVMe instance store volumes using a bootstrap script;
85-
86-
- Creates appropriate storage classes for Materialize.
87-
88-
Associated with this change:
82+
- Creates a node group for Materialize.
83+
- Configures NVMe instance store volumes as swap using a daemonset.
84+
- Enables swap at the Kubelet.
8985

9086
- The following configuration options are available:
9187

92-
- [`enable_disk_support`]
93-
- [`disk_support_config`]
94-
95-
- The default [`gke_config.machine_type`] has changed from `e2-standard-4` to
96-
`n2-highmem-8`. See [Recommended instance types](#recommended-instance-types).
97-
98-
[enable disk support]:
99-
https://github.com/MaterializeInc/terraform-google-materialize?tab=readme-ov-file#disk-support-for-materialize-on-gcp
100-
101-
[`enable_disk_support`]:
102-
https://github.com/MaterializeInc/terraform-google-materialize?tab=readme-ov-file#input_enable_disk_support
103-
104-
[`disk_support_config`]:
105-
https://github.com/MaterializeInc/terraform-google-materialize?tab=readme-ov-file#input_disk_support_config
88+
- [`swap_enabled`]
10689

107-
[`gke_config.machine_type`]:
108-
https://github.com/MaterializeInc/terraform-google-materialize?tab=readme-ov-file#input_gke_config
90+
See [Upgrade Notes](https://github.com/MaterializeInc/terraform-google-materialize?tab=readme-ov-file#v061)
10991

11092
## CPU affinity
11193

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
---
2+
title: "Guide: Node preparation for swap and upgrading to v26"
3+
description: "Upgrade procedure when upgrading to v26 which has swap enabled by default."
4+
menu:
5+
main:
6+
parent: "installation"
7+
weight: 69
8+
---
9+
10+
Swap allows for infrequently accessed data to be moved from memory to disk. Enabling swap reduces the memory required to operate Materialize and improves cost efficiency. Upgrades to v26 and later have swap enabled by default.
11+
12+
## Upgrading to v26 with swap requires node preparation
13+
We've added new labels to the node selectors for clusterd pods to enable smooth upgrades. As a result, if you are running v25.2.12 or earlier, your existing nodes will not match these selectors and won't be selected to run the pods. Before upgrading to v26, you must prepare your nodes by adding the required labels.
14+
15+
## Preparing for the upgrade using terraform
16+
v0.6.1 of the Materialize terraform modules can handle much of the preparation work for you. If using our terraform modules, please follow the instructions provided in the respective upgrade notes:
17+
- [AWS](https://github.com/MaterializeInc/terraform-aws-materialize?tab=readme-ov-file#v061)
18+
- [GCP](https://github.com/MaterializeInc/terraform-google-materialize?tab=readme-ov-file#v061)
19+
- [Azure](https://github.com/MaterializeInc/terraform-azurerm-materialize?tab=readme-ov-file#v061)
20+
21+
## Preparing for the upgrade without terraform
22+
23+
1. Label existing scratchfs/lgalloc node groups
24+
25+
If using lgalloc on scratchfs volumes, you must add the additional `"materialize.cloud/scratch-fs": "true"` label to your existing node groups and nodes running Materialize workloads.
26+
27+
Adding this label to the node group (or nodepool) configuration will apply the label to newly spawned nodes, but depending on your cloud provider may not apply the label to existing nodes.
28+
29+
If not automatically applied, you may need to use `kubectl label` to apply the change to existing nodes.
30+
31+
1. Modify existing scratchfs/lgalloc disk setup daemonset selector labels
32+
33+
If using our [ephemeral-storage-setup image](https://github.com/MaterializeInc/ephemeral-storage-setup-image/) as a daemonset to configure scratchfs LVM volumes for lgalloc, you must add the additional `"materialize.cloud/scratch-fs": "true"` label to multiple places:
34+
* `spec.selector.matchLabels`
35+
* `spec.template.metadata.labels`
36+
* (if using `nodeAffinity`) `spec.template.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms`
37+
* (if using `nodeSelector`) `spec.template.spec.nodeSelector`
38+
39+
You **must** use at least one of `nodeAffinity` or `nodeSelector`.
40+
41+
It is recommended to rename this daemonset to make it clear that it is only for the legacy scratchfs/lgalloc nodes (for example, change the name `disk-setup` to `disk-setup-scratchfs`).
42+
43+
1. Create a new node group for swap
44+
45+
1. Create a new node group (or ec2nodeclass and nodepool if using Karpenter in AWS) using an instance type with local NVMe disks. If in GCP, the disks must be in `raw` mode.
46+
47+
1. Label the node group with `"materialize.cloud/swap": "true"`.
48+
49+
1. If using AWS Bottlerocket AMIs (highly recommended if running in AWS), set the following in the userdata to configure the disks for swap, and enable swap in the kubelet:
50+
51+
```toml
52+
[settings.oci-defaults.resource-limits.max-open-files]
53+
soft-limit = 1048576
54+
hard-limit = 1048576
55+
56+
[settings.bootstrap-containers.diskstrap]
57+
source = "docker.io/materialize/ephemeral-storage-setup-image:v0.4.0"
58+
mode = "once"
59+
essential = "true"
60+
# ["swap", "--cloud-provider", "aws", "--bottlerocket-enable-swap"]
61+
user-data = "WyJzd2FwIiwgIi0tY2xvdWQtcHJvdmlkZXIiLCAiYXdzIiwgIi0tYm90dGxlcm9ja2V0LWVuYWJsZS1zd2FwIl0="
62+
63+
[kernel.sysctl]
64+
"vm.swappiness" = "100"
65+
"vm.min_free_kbytes" = "1048576"
66+
"vm.watermark_scale_factor" = "100"
67+
```
68+
69+
1. If not using AWS or not using Bottlerocket AMIs, and your node group supports it (Azure does not as of 2025-11-05), add a startup taint. This taint will be removed after the disk is configured for swap.
70+
71+
```yaml
72+
taints:
73+
- key: startup-taint.cluster-autoscaler.kubernetes.io/disk-unconfigured
74+
value: "true"
75+
effect: NoSchedule
76+
```
77+
78+
1. Create a new disk-setup-swap daemonset
79+
80+
If using Bottlerocket AMIs in AWS, you may skip this step, as you should have configured swap using userdata previously.
81+
82+
Create a new daemonset using our [ephemeral-storage-setup image](https://github.com/MaterializeInc/ephemeral-storage-setup-image/) to configure the disks for swap and to enable swap in the kubelet.
83+
84+
The arguments to the init container in this daemonset need to be configured for swap. See the examples in the linked git repository for more details.
85+
86+
This daemonset should run only on the new swap nodes, so we need to ensure it has the `"materialize.cloud/swap": "true"` label in several places:
87+
88+
* `spec.selector.matchLabels`
89+
* `spec.template.metadata.labels`
90+
* (if using `nodeAffinity`) `spec.template.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms`
91+
* (if using `nodeSelector`) `spec.template.spec.nodeSelector`
92+
93+
You **must** use at least one of `nodeAffinity` or `nodeSelector`.
94+
95+
It is recommended to name this daemonset to clearly indicate that it is for configuring swap (ie: `disk-setup-swap`), as opposed to other disk configurations.
96+
97+
1. (Optional) Configure environmentd to also use swap
98+
99+
Swap is enabled by default for clusterd, but not for environmentd. If you'd like to enable swap for environmentd, add `"materialize.cloud/swap": "true"` to the `environmentd.node_selector` helm value.
100+
101+
1. Upgrade the Materialize operator helm chart to v26
102+
103+
The cluster size definitions for existing Materialize instances will not be changed at this point, but any newly created Materialize instances, or upgraded Materialize instances will pick up the new sizes.
104+
105+
Do not create any new Materialize instances at versions less than v26, or perform any rollouts to existing Materialize instances to versions less than v26.
106+
107+
1. Upgrade existing Materialize instances to v26
108+
109+
The new v26 pods should go to the new swap nodes.
110+
111+
You can verify that swap is enabled and working by `exec`ing into a clusterd pod and running `cat /sys/fs/cgroup/memory.swap.max`. If you get a number greater than 0, swap is enabled and the pod is allowed to use it.
112+
113+
1. (Optional) Delete old scratchfs/lgalloc node groups and disk-setup-scratchfs daemonset
114+
115+
If you no longer have anything running on the old scratchfs/lgalloc nodes, you may delete their node group and the disk-setup-scratchfs daemonset.
116+
117+
## How to disable swap
118+
If you wish to opt out of swap and retain the old behavior, you may set `operator.clusters.swap_enabled: false` in your helm values.

doc/user/data/self_managed/aws_terraform_versions.yml

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,25 +4,30 @@ columns:
44

55
rows:
66
- "Terraform version": |
7-
[v0.5.5](https://github.com/MaterializeInc/terraform-aws-materialize/releases/)
7+
[v0.6.1](https://github.com/MaterializeInc/terraform-aws-materialize/releases/tag/v0.6.1)
8+
"Notable changes": |
9+
- Initial swap support. See upgrade notes for details.
10+
11+
- "Terraform version": |
12+
[v0.5.5](https://github.com/MaterializeInc/terraform-aws-materialize/releases/tag/v0.5.5)
813
"Notable changes": |
914
- Uses `terraform-helm-materialize` v0.1.26.
1015
1116
- "Terraform version": |
12-
[v0.5.4](https://github.com/MaterializeInc/terraform-aws-materialize/releases/)
17+
[v0.5.4](https://github.com/MaterializeInc/terraform-aws-materialize/releases/tag/v0.5.4)
1318
"Notable changes": |
1419
- Uses `terraform-helm-materialize` v0.1.25.
1520
1621
- "Terraform version": |
17-
[v0.4.9](https://github.com/MaterializeInc/terraform-aws-materialize/releases/)
22+
[v0.4.9](https://github.com/MaterializeInc/terraform-aws-materialize/releases/tag/v0.4.9)
1823
"Notable changes": |
1924
2025
- Uses `terraform-helm-materialize` v0.1.19.
2126
- Bumps Materialize release to [self-managed 25.2](/releases)
2227
- Adds support for password authentication and enabling RBAC
2328
2429
- "Terraform version": |
25-
[v0.4.6](https://github.com/MaterializeInc/terraform-aws-materialize/releases/)
30+
[v0.4.6](https://github.com/MaterializeInc/terraform-aws-materialize/releases/tag/v0.4.6)
2631
"Notable changes": |
2732
2833
- Adds support for passing in additional Materialize instance configuration

doc/user/data/self_managed/azure_terraform_versions.yml

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,25 +4,30 @@ columns:
44

55
rows:
66
- "Terraform version": |
7-
[v0.5.5](https://github.com/MaterializeInc/terraform-azurerm-materialize/releases/)
7+
[v0.6.1](https://github.com/MaterializeInc/terraform-azurerm-materialize/releases/tag/v0.6.1)
8+
"Notable changes": |
9+
- Initial swap support. See upgrade notes for details.
10+
11+
- "Terraform version": |
12+
[v0.5.5](https://github.com/MaterializeInc/terraform-azurerm-materialize/releases/tag/v0.5.5)
813
"Notable changes": |
914
- Uses `terraform-helm-materialize` v0.1.26.
1015
1116
- "Terraform version": |
12-
[v0.5.4](https://github.com/MaterializeInc/terraform-azurerm-materialize/releases/)
17+
[v0.5.4](https://github.com/MaterializeInc/terraform-azurerm-materialize/releases/tag/v0.5.4)
1318
"Notable changes": |
1419
- Uses `terraform-helm-materialize` v0.1.25.
1520
1621
- "Terraform version": |
17-
[v0.4.6](https://github.com/MaterializeInc/terraform-azurerm-materialize/releases/)
22+
[v0.4.6](https://github.com/MaterializeInc/terraform-azurerm-materialize/releases/tag/v0.4.6)
1823
"Notable changes": |
1924
2025
- Uses `terraform-helm-materialize` v0.1.19.
2126
- Bumps Materialize release to [self-managed 25.2](/releases)
2227
- Adds support for password authentication and enabling RBAC
2328
2429
- "Terraform version": |
25-
[v0.4.3](https://github.com/MaterializeInc/terraform-azurerm-materialize/releases/)
30+
[v0.4.3](https://github.com/MaterializeInc/terraform-azurerm-materialize/releases/tag/v0.4.3)
2631
"Notable changes": |
2732
2833
- Adds support for passing in additional Materialize instance configuration

doc/user/data/self_managed/gcp_terraform_versions.yml

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,25 +4,30 @@ columns:
44

55
rows:
66
- "Terraform version": |
7-
[v0.5.5](https://github.com/MaterializeInc/terraform-google-materialize/releases/)
7+
[v0.6.1](https://github.com/MaterializeInc/terraform-google-materialize/releases/tag/v0.6.1)
8+
"Notable changes": |
9+
- Initial swap support. See upgrade notes for details.
10+
11+
- "Terraform version": |
12+
[v0.5.5](https://github.com/MaterializeInc/terraform-google-materialize/releases/tag/v0.5.5)
813
"Notable changes": |
914
- Uses `terraform-helm-materialize` v0.1.26.
1015
1116
- "Terraform version": |
12-
[v0.5.4](https://github.com/MaterializeInc/terraform-google-materialize/releases/)
17+
[v0.5.4](https://github.com/MaterializeInc/terraform-google-materialize/releases/tag/v0.5.4)
1318
"Notable changes": |
1419
- Uses `terraform-helm-materialize` v0.1.25.
1520
1621
- "Terraform version": |
17-
[v0.4.6](https://github.com/MaterializeInc/terraform-google-materialize/releases/)
22+
[v0.4.6](https://github.com/MaterializeInc/terraform-google-materialize/releases/tag/v0.4.6)
1823
"Notable changes": |
1924
2025
- Uses `terraform-helm-materialize` v0.1.19.
2126
- Bumps Materialize release to [self-managed 25.2](/releases)
2227
- Adds support for password authentication and enabling RBAC
2328
2429
- "Terraform version": |
25-
[v0.4.3](https://github.com/MaterializeInc/terraform-google-materialize/releases/)
30+
[v0.4.3](https://github.com/MaterializeInc/terraform-google-materialize/releases/tag/v0.4.3)
2631
"Notable changes": |
2732
2833
- Adds support for passing in additional Materialize instance configuration

0 commit comments

Comments
 (0)