Add offline backup for foremanctl#507
Conversation
| # Critical volumes to backup | ||
| critical_podman_volumes: | ||
| - iop-core-kafka-data | ||
| - iop-service-vmaas-data |
There was a problem hiding this comment.
I have not tested IOP backups outside of DB extensively with this. Will need some eyes here.
15ba9dd to
c6de1eb
Compare
ianballou
left a comment
There was a problem hiding this comment.
I did an automated test run, take these with a grain of salt, but I wanted to post early in case they're real for extra time to test and fix.
Firstly, the DBs did get backed up, so awesome, but there were some hiccups that stopped the full process from running:
Bug #1: podman_network.yaml fails when no custom networks exist
File: src/playbooks/backup/tasks/podman_network.yaml
Severity: Blocker — prevents backup from completing
Description: The shell command podman network ls --format '{{.Name}}' | grep -v '^podman$' | while read net; do ... returns exit code 1 when there are no custom networks, because grep -v finds no matching lines.
Fix: Add || true after the grep, or use a different approach:
# Option A: tolerate empty result
failed_when: networks_json.rc not in [0, 1]
# Option B: check first, skip if no custom networksBug #2: Wrong Foreman tasks API endpoint
File: src/playbooks/backup/tasks/preflight.yaml
Severity: High — preflight silently skips running task detection
Description: Uses https://{{ fqdn }}/api/v2/tasks?state=running which returns 404. The correct endpoint is https://{{ fqdn }}/foreman_tasks/api/tasks?state=running&search=state%3Drunning. Because failed_when: false is set, the error is silently ignored.
Impact: Backups will proceed even with running Foreman tasks, risking data inconsistency.
Bug #3: pg_isready and pg_dump not available on host
... I cut the output here, I'm not sure why these commands weren't on my box. It's not related to this PR I don't think.
Bug #4: Hardcoded parameters.yaml path in metadata task
File: src/playbooks/backup/tasks/metadata.yaml
Severity: Low — affects metadata accuracy only
Description: ansible.builtin.slurp reads from /var/lib/foremanctl/parameters.yaml but foremanctl's state directory is configurable via OBSAH_STATE. In dev/vagrant setups, the actual path is different (e.g., /vagrant/.var/lib/foremanctl/parameters.yaml).
Impact: enabled_features: [] in metadata despite features being configured. Doesn't affect DB dump correctness.
Fix: Use the state_dir variable instead of hardcoding the path.
|
Will update the tasks endpoint and parameters.yaml path.. About the podman networks, those are created for IOP.. Like https://github.com/theforeman/foremanctl/blob/master/src/roles/iop_network/tasks/main.yaml so I'd assume we have that present in production deployments. We can add some handling for when it's not. |
|
Maybe nitpick but does it make sense to include the not yet implemented flags when doing |
I am fine either way but it's helpful guidance for future PRs and documentation to look at. |
| @@ -0,0 +1,40 @@ | |||
| --- | |||
| - name: List podman networks | |||
There was a problem hiding this comment.
Are podman networks something we need to backup?
There was a problem hiding this comment.
Required for IOP..Not sure if we eed backup for this..I guess restore can just rely on foremanctl deploy if it can and backup files here can be for reference/verification.
There was a problem hiding this comment.
foremanctl can redeploy all of that, no need to backup
| - name: Export critical volumes | ||
| ansible.builtin.command: | ||
| cmd: podman volume export {{ item }} -o {{ backup_dir_full }}/volume-{{ item }}.tar | ||
| loop: "{{ critical_podman_volumes }}" |
There was a problem hiding this comment.
Cool, just backs up the important volumes the foreman-maintain backs up today.
| @@ -0,0 +1,190 @@ | |||
| --- | |||
| # Preflight checks for backup operation | |||
There was a problem hiding this comment.
Part of me wonders if the checks should be abstracted out to roles if they will be helpful outside of backup. I suppose we could cross that bridge when we get there. Eventually I envision foremanctl having a library of checks much like foreman-maintain. These checks will go into different playbooks, but even then the checks may run or not run depending on the configuration. Certain flavors or configurations will make some checks applicable and others not.
There is src/roles/checks/ today, which looks like is meant to be a single spot for all checks.
If any of these checks seem applicable to deployment or perhaps even health which is going to be implemented soon, it might be worth starting to pull the checks out somewhere.
There was a problem hiding this comment.
The main things here are checking for active tasks in foreman and pulp and running amcheck on DBs to be backed up..The Db index checks can be helpful for health and generic checks. Will move those into the checks/ roles. 👍🏼
| @@ -0,0 +1,28 @@ | |||
| --- | |||
| - name: Backup podman quadlet container definitions | |||
There was a problem hiding this comment.
I'm thinking about what we would do with quadlet files. These files will contain secrets, volume info, image references, FQDNs. Perhaps some of these we can expect to remain the same, but is the volume info and image info going to remain the same?
Currently, we do not enforce that the z-stream between versions of Foreman/Satellite have to be the same, which tells me these quadlet files may need to be regenerated within the context of the new environment.
There was a problem hiding this comment.
Ack..The restore would generate these on the system with foremanctl deploy..I am wondering if there's value in backing up the container definitions regardless for reference..
There was a problem hiding this comment.
Yeah, I suppose it wouldn't hurt as long as their collection is simple and safe.
| register: foremanctl_state_dir | ||
|
|
||
| - name: Backup foremanctl state directory | ||
| community.general.archive: |
There was a problem hiding this comment.
In order to support incremental exports, tar --listed-incremental needs to be used. It seems this is not supported by community.general.archive, so it might be better to avoid its use altogether.
- name: Backup config files
ansible.builtin.command:
cmd: >
tar --create --gzip
--listed-incremental={{ backup_dir_full }}/.config.snar
--ignore-failed-read
--file {{ backup_dir_full }}/foremanctl_state.tar.gz
{{ config_file_paths | join(' ') }}Also we use config_files but foremanctl-state, bit of an inconsistency there.
There was a problem hiding this comment.
there is no need to do incrementals for the configs. (yes, we do them today in foreman maintain, but I think it's more work than use)
| dest: "{{ backup_dir_full }}/quadlet-files.tar.gz" | ||
| format: gz | ||
| mode: '0644' | ||
| when: database_mode == 'internal' |
There was a problem hiding this comment.
I've seen this in a few spots, why is this scoped to only internal database? This seems unrelated to the database.
There was a problem hiding this comment.
I did try this only on internal DB so the scoping was intentional. However this is not needed on non-DB backups..WIll update. 👍🏼
4da7174 to
10c3e00
Compare
| @@ -0,0 +1,237 @@ | |||
| --- | |||
| # Detect which databases exist on the system | |||
There was a problem hiding this comment.
Can't we take the info from the enabled features?
There was a problem hiding this comment.
+1, I think foreman-maintain previously did a lot of discovery, but with foremanctl we should be able to rely on the saved user configuration.
| - item.stat.exists | ||
| - item.stat.size > 0 | ||
| fail_msg: >- | ||
| Database dump failed or produced empty file: |
There was a problem hiding this comment.
if it failed, pg_dump would have existed non zero, right? then we'd never reach this step.
| --- | ||
| - name: Check if foremanctl state directory exists | ||
| ansible.builtin.stat: | ||
| path: "{{ lookup('env', 'OBSAH_STATE') }}" |
There was a problem hiding this comment.
this is already available via the obsah_state_path variable (see theforeman/obsah#86)
| format: gz | ||
| mode: '0644' | ||
| exclude_path: | ||
| - "{{ lookup('env', 'OBSAH_STATE') }}/certs" |
There was a problem hiding this comment.
For some reason I was thinking podman secrets would be enough to get these but I think that was incorrect..Will back this up..
| - name: Get hostname | ||
| ansible.builtin.command: | ||
| cmd: hostname -f | ||
| register: hostname_result |
|
|
||
| - name: Get OS version | ||
| ansible.builtin.command: | ||
| cmd: cat /etc/redhat-release |
There was a problem hiding this comment.
I am sure we can get that from ansible directly
|
|
||
| - name: Gather package facts | ||
| ansible.builtin.package_facts: | ||
| manager: rpm |
There was a problem hiding this comment.
do we need to force rpm here? I would have expected no, and then foremanctl will continue working on Debian
|
|
||
| - name: Query container images | ||
| ansible.builtin.command: | ||
| cmd: podman images --format json |
There was a problem hiding this comment.
https://docs.ansible.com/projects/ansible/latest/collections/containers/podman/podman_image_info_module.html seems like the better fit?
| - parameters_slurp is succeeded | ||
| - parameters_slurp.content is defined | ||
|
|
||
| - name: Set enabled features from parameters |
There was a problem hiding this comment.
you should have access to enabled_features var already
| @@ -0,0 +1,112 @@ | |||
| --- | |||
| # Backup podman secrets | |||
There was a problem hiding this comment.
all these come from foremanctl data, no need to backup these
There was a problem hiding this comment.
The only secrets I was thinking of that need preserving are pulp-django-secret-key and pulp-symmetric-key which we'd need for pulp data restore..Those are saved in /var/lib/pulp/ which are backing up so it would be restored from there. Will remove the secrets and networks backup here.. 👍🏼
| --- | ||
| - name: List podman volumes | ||
| ansible.builtin.command: | ||
| cmd: podman volume ls --format {% raw %}'{{.Name}}'{% endraw %} |
There was a problem hiding this comment.
I am sure there is a module in containers.podman for this
|
not a full review (I stopped somewhere around the secrets backup), but overall this feels a lot like "let's write a huge bash script and then wrap it in YAML" and not like Ansible :( |
|
Based on direction of reviews, I am thinking that we do not need to backup anything which can be safely regenerated by foremanctl deploy..That would mean I can take out podman_secrets, podman networks, podman volumes, quadlet and systemd file backups. That would leave us with DBs, /var/lib/pulp for pulp content and foremanctl state .. Am I missing anything? |
a759002 to
90c76de
Compare
90c76de to
fc68cdc
Compare
|
Updated PR to drop backups for podman network, volume and secrets. We can rely on deploy to recreate these. The backup now only backs up DBs, pulp content and foremanctl state.. Also addressed some reviews around using modules where applicable. |
ianballou
left a comment
There was a problem hiding this comment.
Just a couple of small bugs, once I worked through them I got a backup!
$ ls /var/tmp/foreman-backup-test/foreman-backup-20260520T193319/
candlepin.dump foreman.dump metadata.yml pulp.dump
|
|
||
| - name: Check for running Foreman tasks | ||
| ansible.builtin.uri: | ||
| url: "https://{{ foreman_server_fqdn }}/foreman_tasks/api/tasks?state=running&per_page=1" |
There was a problem hiding this comment.
| url: "https://{{ foreman_server_fqdn }}/foreman_tasks/api/tasks?state=running&per_page=1" | |
| url: "https://{{ foreman_server_fqdn }}/foreman_tasks/api/tasks?search=state%3Drunning&per_page=1" |
Just state=running doesn't seem to filter anything in foreman-tasks. I had to include search for the tasks to be filtered. Otherwise all of my running tasks returned.
Also, can we not use Foreman Ansible Modules for this search?
| @@ -0,0 +1,259 @@ | |||
| --- | |||
| - name: Backup Foreman databases and configuration | |||
| hosts: all | |||
There was a problem hiding this comment.
Without this, if you try doing this via the vagrant-libvirt hypervisor and not localhost, it will try running backup on all your machines. Also, it matches the other playbooks.
| hosts: all | |
| hosts: | |
| - quadlet |
| 'name': (item.RepoTags | first) if item.RepoTags | default([]) | length > 0 else '<none>', | ||
| 'digest': (item.RepoDigests | first) if item.RepoDigests | default([]) | length > 0 else '', | ||
| 'id': item.Id, | ||
| 'created': item.Created | int |
There was a problem hiding this comment.
This ended up being 0 for me, I think because item.Created returns a timestamp.
|
|
||
| - name: Set Foreman running tasks count | ||
| ansible.builtin.set_fact: | ||
| foreman_running_tasks: "{{ foreman_tasks_check.json.total | default(0) | int }}" |
There was a problem hiding this comment.
| foreman_running_tasks: "{{ foreman_tasks_check.json.total | default(0) | int }}" | |
| foreman_running_tasks: "{{ foreman_tasks_check.json.subtotal | default(0) | int }}" |
The total is pre-filtering.
| url: "https://{{ foreman_server_fqdn }}/foreman_tasks/api/tasks?state=running&per_page=1" | ||
| method: GET | ||
| user: "{{ foreman_initial_admin_username }}" | ||
| password: "{{ foreman_initial_admin_password }}" | ||
| force_basic_auth: true | ||
| validate_certs: false | ||
| return_content: true | ||
| register: foreman_tasks_wait | ||
| until: foreman_tasks_wait.json.total | default(0) == 0 |
There was a problem hiding this comment.
Same issues as above with the querying and json total.
fc68cdc to
6740de3
Compare
|
Pushed some changes based on the last set of reviews.. 👍🏼 |
6740de3 to
e17ecbc
Compare
|
Before Ian left for vacation 🍹 , he sent me his review which have been addressed in the last commit. |
Implements comprehensive offline backup functionality for Foreman deployments: - Backs up all databases (foreman, candlepin, pulp, 5 IOP DBs) - Backs up podman secrets, networks, volumes, quadlet files - Backs up systemd units and foremanctl state - Includes metadata with container image digests for restore compatibility - Preflight checks for running tasks and database integrity (amcheck) - Automatic service restoration on failure Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
e17ecbc to
fe3ed91
Compare
Implements comprehensive offline backup functionality for Foreman deployments:
Why are you introducing these changes? (Problem description, related links)
What are the changes introduced in this pull request?
How to test this pull request
I got a foremanctl box with normal deploy. On this box, clone foremanctl repo and checkout this branch.
cd /root/foremanctl
source .venv/bin/activate
export OBSAH_STATE=/var/lib/foremanctl
Then try ./foremanctl --help
Also,
The help section has placeholders for incremental, online and --tar-volume-size which will be implemented in follow up cards/PRs.
You can run :
Command:
I have a dummy restore script which can be used for testing and also getting steps to run manually when testing.:
Download the script
wget https://gist.githubusercontent.com/sjha4/35d98b318f15753a678a406fb0fb14ad/raw/test-restore-final.sh
Make it executable
chmod +x test-restore-final.sh
Run restore
./test-restore-final.sh /path/to/backup/foreman-backup-TIMESTAMP
Steps to reproduce:
Checklist