Skip to content

Add offline backup for foremanctl#507

Open
sjha4 wants to merge 1 commit into
theforeman:masterfrom
sjha4:backup-offline-implementation
Open

Add offline backup for foremanctl#507
sjha4 wants to merge 1 commit into
theforeman:masterfrom
sjha4:backup-offline-implementation

Conversation

@sjha4
Copy link
Copy Markdown
Contributor

@sjha4 sjha4 commented May 13, 2026

Implements comprehensive offline backup functionality for Foreman deployments:

  • Backs up all databases (foreman, candlepin, pulp, 5 IOP DBs)
  • Backs up podman secrets, networks, volumes, quadlet files
  • Backs up systemd units and foremanctl state
  • Includes metadata with container image digests for restore compatibility
  • Preflight checks for running tasks and database integrity (amcheck)
  • Automatic service restoration on failure

Why are you introducing these changes? (Problem description, related links)

What are the changes introduced in this pull request?

  • Offline backup

How to test this pull request

I got a foremanctl box with normal deploy. On this box, clone foremanctl repo and checkout this branch.
cd /root/foremanctl
source .venv/bin/activate
export OBSAH_STATE=/var/lib/foremanctl

Then try ./foremanctl --help

Also,

(.venv) [root@ip-10-0-167-40 foremanctl]# ./foremanctl backup  --help
usage: foremanctl backup [-h] [-v] [--incremental] [--online] [--skip-pulp-content] [--tar-volume-size TAR_VOLUME_SIZE] [--wait-for-tasks]
                         backup_dir

Create offline backup of Foreman databases and configuration

positional arguments:
  backup_dir            Directory where backup files will be stored

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         verbose output
  --incremental         Perform incremental backup (not yet implemented)
  --online              Perform online backup without stopping services (not yet implemented)
  --skip-pulp-content   Skip Pulp content directory backup (not yet implemented)
  --tar-volume-size TAR_VOLUME_SIZE
                        Split tar archives at specified size in MB (not yet implemented, for Pulp content only)
  --wait-for-tasks      Wait for running tasks to complete instead of failing immediately

The help section has placeholders for incremental, online and --tar-volume-size which will be implemented in follow up cards/PRs.

You can run :
Command:

cd /root/foremanctl
source .venv/bin/activate
export OBSAH_STATE=/var/lib/foremanctl
./foremanctl backup /var/tmp/foreman-backup-test --wait-for-tasks

I have a dummy restore script which can be used for testing and also getting steps to run manually when testing.:

Download the script

wget https://gist.githubusercontent.com/sjha4/35d98b318f15753a678a406fb0fb14ad/raw/test-restore-final.sh

Make it executable

chmod +x test-restore-final.sh

Run restore

./test-restore-final.sh /path/to/backup/foreman-backup-TIMESTAMP

Steps to reproduce:

  • On a foremanctl box, run foremanctl backup BACKUP_DIR

Checklist

  • Tests added/updated (if applicable)
  • Documentation updated (if applicable)

@sjha4 sjha4 marked this pull request as ready for review May 13, 2026 19:28
Comment thread src/playbooks/backup/tasks/metadata.yaml Outdated
Comment thread src/playbooks/backup/backup.yaml Outdated
# Critical volumes to backup
critical_podman_volumes:
- iop-core-kafka-data
- iop-service-vmaas-data
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tested IOP backups outside of DB extensively with this. Will need some eyes here.

@sjha4 sjha4 force-pushed the backup-offline-implementation branch 2 times, most recently from 15ba9dd to c6de1eb Compare May 14, 2026 14:14
Copy link
Copy Markdown

@ianballou ianballou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an automated test run, take these with a grain of salt, but I wanted to post early in case they're real for extra time to test and fix.

Firstly, the DBs did get backed up, so awesome, but there were some hiccups that stopped the full process from running:

Bug #1: podman_network.yaml fails when no custom networks exist

File: src/playbooks/backup/tasks/podman_network.yaml
Severity: Blocker — prevents backup from completing
Description: The shell command podman network ls --format '{{.Name}}' | grep -v '^podman$' | while read net; do ... returns exit code 1 when there are no custom networks, because grep -v finds no matching lines.
Fix: Add || true after the grep, or use a different approach:

# Option A: tolerate empty result
  failed_when: networks_json.rc not in [0, 1]

# Option B: check first, skip if no custom networks

Bug #2: Wrong Foreman tasks API endpoint

File: src/playbooks/backup/tasks/preflight.yaml
Severity: High — preflight silently skips running task detection
Description: Uses https://{{ fqdn }}/api/v2/tasks?state=running which returns 404. The correct endpoint is https://{{ fqdn }}/foreman_tasks/api/tasks?state=running&search=state%3Drunning. Because failed_when: false is set, the error is silently ignored.
Impact: Backups will proceed even with running Foreman tasks, risking data inconsistency.

Bug #3: pg_isready and pg_dump not available on host

... I cut the output here, I'm not sure why these commands weren't on my box. It's not related to this PR I don't think.

Bug #4: Hardcoded parameters.yaml path in metadata task

File: src/playbooks/backup/tasks/metadata.yaml
Severity: Low — affects metadata accuracy only
Description: ansible.builtin.slurp reads from /var/lib/foremanctl/parameters.yaml but foremanctl's state directory is configurable via OBSAH_STATE. In dev/vagrant setups, the actual path is different (e.g., /vagrant/.var/lib/foremanctl/parameters.yaml).
Impact: enabled_features: [] in metadata despite features being configured. Doesn't affect DB dump correctness.
Fix: Use the state_dir variable instead of hardcoding the path.

@sjha4
Copy link
Copy Markdown
Contributor Author

sjha4 commented May 14, 2026

Will update the tasks endpoint and parameters.yaml path..

About the podman networks, those are created for IOP.. Like https://github.com/theforeman/foremanctl/blob/master/src/roles/iop_network/tasks/main.yaml so I'd assume we have that present in production deployments. We can add some handling for when it's not.

@aidenfine
Copy link
Copy Markdown
Contributor

Maybe nitpick but does it make sense to include the not yet implemented flags when doing backup --help? Would you be opposed to just leaving them out in this PR?

@sjha4
Copy link
Copy Markdown
Contributor Author

sjha4 commented May 14, 2026

Maybe nitpick but does it make sense to include the not yet implemented flags when doing backup --help? Would you be opposed to just leaving them out in this PR?

I am fine either way but it's helpful guidance for future PRs and documentation to look at.

@@ -0,0 +1,40 @@
---
- name: List podman networks
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are podman networks something we need to backup?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required for IOP..Not sure if we eed backup for this..I guess restore can just rely on foremanctl deploy if it can and backup files here can be for reference/verification.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foremanctl can redeploy all of that, no need to backup

- name: Export critical volumes
ansible.builtin.command:
cmd: podman volume export {{ item }} -o {{ backup_dir_full }}/volume-{{ item }}.tar
loop: "{{ critical_podman_volumes }}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, just backs up the important volumes the foreman-maintain backs up today.

@@ -0,0 +1,190 @@
---
# Preflight checks for backup operation
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of me wonders if the checks should be abstracted out to roles if they will be helpful outside of backup. I suppose we could cross that bridge when we get there. Eventually I envision foremanctl having a library of checks much like foreman-maintain. These checks will go into different playbooks, but even then the checks may run or not run depending on the configuration. Certain flavors or configurations will make some checks applicable and others not.

There is src/roles/checks/ today, which looks like is meant to be a single spot for all checks.

If any of these checks seem applicable to deployment or perhaps even health which is going to be implemented soon, it might be worth starting to pull the checks out somewhere.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main things here are checking for active tasks in foreman and pulp and running amcheck on DBs to be backed up..The Db index checks can be helpful for health and generic checks. Will move those into the checks/ roles. 👍🏼

@@ -0,0 +1,28 @@
---
- name: Backup podman quadlet container definitions
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about what we would do with quadlet files. These files will contain secrets, volume info, image references, FQDNs. Perhaps some of these we can expect to remain the same, but is the volume info and image info going to remain the same?

Currently, we do not enforce that the z-stream between versions of Foreman/Satellite have to be the same, which tells me these quadlet files may need to be regenerated within the context of the new environment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack..The restore would generate these on the system with foremanctl deploy..I am wondering if there's value in backing up the container definitions regardless for reference..

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I suppose it wouldn't hurt as long as their collection is simple and safe.

register: foremanctl_state_dir

- name: Backup foremanctl state directory
community.general.archive:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to support incremental exports, tar --listed-incremental needs to be used. It seems this is not supported by community.general.archive, so it might be better to avoid its use altogether.

- name: Backup config files
  ansible.builtin.command:
    cmd: >
      tar --create --gzip
      --listed-incremental={{ backup_dir_full }}/.config.snar
      --ignore-failed-read
      --file {{ backup_dir_full }}/foremanctl_state.tar.gz
      {{ config_file_paths | join(' ') }}

Also we use config_files but foremanctl-state, bit of an inconsistency there.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need to do incrementals for the configs. (yes, we do them today in foreman maintain, but I think it's more work than use)

dest: "{{ backup_dir_full }}/quadlet-files.tar.gz"
format: gz
mode: '0644'
when: database_mode == 'internal'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this in a few spots, why is this scoped to only internal database? This seems unrelated to the database.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try this only on internal DB so the scoping was intentional. However this is not needed on non-DB backups..WIll update. 👍🏼

@sjha4 sjha4 force-pushed the backup-offline-implementation branch 4 times, most recently from 4da7174 to 10c3e00 Compare May 15, 2026 17:44
@@ -0,0 +1,237 @@
---
# Detect which databases exist on the system
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we take the info from the enabled features?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I think foreman-maintain previously did a lot of discovery, but with foremanctl we should be able to rely on the saved user configuration.

- item.stat.exists
- item.stat.size > 0
fail_msg: >-
Database dump failed or produced empty file:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it failed, pg_dump would have existed non zero, right? then we'd never reach this step.

---
- name: Check if foremanctl state directory exists
ansible.builtin.stat:
path: "{{ lookup('env', 'OBSAH_STATE') }}"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already available via the obsah_state_path variable (see theforeman/obsah#86)

format: gz
mode: '0644'
exclude_path:
- "{{ lookup('env', 'OBSAH_STATE') }}/certs"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I was thinking podman secrets would be enough to get these but I think that was incorrect..Will back this up..

- name: Get hostname
ansible.builtin.command:
cmd: hostname -f
register: hostname_result
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ansible_facts['fqdn'] exists


- name: Get OS version
ansible.builtin.command:
cmd: cat /etc/redhat-release
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure we can get that from ansible directly


- name: Gather package facts
ansible.builtin.package_facts:
manager: rpm
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to force rpm here? I would have expected no, and then foremanctl will continue working on Debian


- name: Query container images
ansible.builtin.command:
cmd: podman images --format json
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- parameters_slurp is succeeded
- parameters_slurp.content is defined

- name: Set enabled features from parameters
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should have access to enabled_features var already

@@ -0,0 +1,112 @@
---
# Backup podman secrets
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these come from foremanctl data, no need to backup these

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only secrets I was thinking of that need preserving are pulp-django-secret-key and pulp-symmetric-key which we'd need for pulp data restore..Those are saved in /var/lib/pulp/ which are backing up so it would be restored from there. Will remove the secrets and networks backup here.. 👍🏼

---
- name: List podman volumes
ansible.builtin.command:
cmd: podman volume ls --format {% raw %}'{{.Name}}'{% endraw %}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure there is a module in containers.podman for this

@evgeni
Copy link
Copy Markdown
Member

evgeni commented May 19, 2026

not a full review (I stopped somewhere around the secrets backup), but overall this feels a lot like "let's write a huge bash script and then wrap it in YAML" and not like Ansible :(

@sjha4
Copy link
Copy Markdown
Contributor Author

sjha4 commented May 19, 2026

Based on direction of reviews, I am thinking that we do not need to backup anything which can be safely regenerated by foremanctl deploy..That would mean I can take out podman_secrets, podman networks, podman volumes, quadlet and systemd file backups. That would leave us with DBs, /var/lib/pulp for pulp content and foremanctl state .. Am I missing anything?

@sjha4 sjha4 force-pushed the backup-offline-implementation branch 2 times, most recently from a759002 to 90c76de Compare May 20, 2026 16:12
Comment thread src/playbooks/backup/tasks/pulp_content.yaml Outdated
@sjha4 sjha4 force-pushed the backup-offline-implementation branch from 90c76de to fc68cdc Compare May 20, 2026 16:46
@sjha4
Copy link
Copy Markdown
Contributor Author

sjha4 commented May 20, 2026

Updated PR to drop backups for podman network, volume and secrets. We can rely on deploy to recreate these. The backup now only backs up DBs, pulp content and foremanctl state.. Also addressed some reviews around using modules where applicable.

Copy link
Copy Markdown

@ianballou ianballou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of small bugs, once I worked through them I got a backup!

$ ls /var/tmp/foreman-backup-test/foreman-backup-20260520T193319/
candlepin.dump  foreman.dump  metadata.yml  pulp.dump


- name: Check for running Foreman tasks
ansible.builtin.uri:
url: "https://{{ foreman_server_fqdn }}/foreman_tasks/api/tasks?state=running&per_page=1"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
url: "https://{{ foreman_server_fqdn }}/foreman_tasks/api/tasks?state=running&per_page=1"
url: "https://{{ foreman_server_fqdn }}/foreman_tasks/api/tasks?search=state%3Drunning&per_page=1"

Just state=running doesn't seem to filter anything in foreman-tasks. I had to include search for the tasks to be filtered. Otherwise all of my running tasks returned.

Also, can we not use Foreman Ansible Modules for this search?

Comment thread src/playbooks/backup/backup.yaml Outdated
@@ -0,0 +1,259 @@
---
- name: Backup Foreman databases and configuration
hosts: all
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, if you try doing this via the vagrant-libvirt hypervisor and not localhost, it will try running backup on all your machines. Also, it matches the other playbooks.

Suggested change
hosts: all
hosts:
- quadlet

'name': (item.RepoTags | first) if item.RepoTags | default([]) | length > 0 else '<none>',
'digest': (item.RepoDigests | first) if item.RepoDigests | default([]) | length > 0 else '',
'id': item.Id,
'created': item.Created | int
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ended up being 0 for me, I think because item.Created returns a timestamp.


- name: Set Foreman running tasks count
ansible.builtin.set_fact:
foreman_running_tasks: "{{ foreman_tasks_check.json.total | default(0) | int }}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
foreman_running_tasks: "{{ foreman_tasks_check.json.total | default(0) | int }}"
foreman_running_tasks: "{{ foreman_tasks_check.json.subtotal | default(0) | int }}"

The total is pre-filtering.

Comment on lines +44 to +52
url: "https://{{ foreman_server_fqdn }}/foreman_tasks/api/tasks?state=running&per_page=1"
method: GET
user: "{{ foreman_initial_admin_username }}"
password: "{{ foreman_initial_admin_password }}"
force_basic_auth: true
validate_certs: false
return_content: true
register: foreman_tasks_wait
until: foreman_tasks_wait.json.total | default(0) == 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issues as above with the querying and json total.

@sjha4 sjha4 force-pushed the backup-offline-implementation branch from fc68cdc to 6740de3 Compare May 21, 2026 15:40
@sjha4
Copy link
Copy Markdown
Contributor Author

sjha4 commented May 21, 2026

Pushed some changes based on the last set of reviews.. 👍🏼

@sjha4 sjha4 force-pushed the backup-offline-implementation branch from 6740de3 to e17ecbc Compare May 26, 2026 17:22
@sjha4
Copy link
Copy Markdown
Contributor Author

sjha4 commented May 26, 2026

Before Ian left for vacation 🍹 , he sent me his review which have been addressed in the last commit.

## foremanctl backup PR #507 — Test Report v3

**Commit:** `6740de3` | **Box:** katello-production | **Date:** 2026-05-21

### Verdict: Backup works end-to-end ✅ | Two non-blocking issues 🔍

**All three v2 bugs are fixed.** The backup ran to successful completion — no errors, no rescue block. All 3 databases dumped in `pg_dump -Fc` format, verified valid. Services stopped and restarted cleanly. API healthy post-backup. Preflight correctly queries foreman_tasks with `?search=state%3Drunning` and checks `subtotal`. Only `quadlet` host targeted.

### New Issues

| #     | Issue                                               | Severity    | Description                                                                                |
|-------|-----------------------------------------------------|-------------|--------------------------------------------------------------------------------------------|
| **1** | Container images duplicated in metadata             | **Low**     | Metadata task is called twice (before dumps + after pulp). The image list uses             |
|       |                                                     |             | `default([]) + [...]` so the second run appends duplicates. 6 images appear as 12.         |
| **2** | Pulp encryption keys not backed up when media empty | **Medium**  | `database_fields.symmetric.key` and `django_secret_key` are included in the pulp content   |
|       |                                                     |             | tar, which is gated on `pulp_media_files.matched > 0`. If media is empty, keys aren't      |
|       |                                                     |             | backed up — but they're critical for restore.                                              |
| —     | Dead code: `critical_podman_volumes` var            | **Cleanup** | Defined but never referenced. Leftover from removed podman volumes backup.                 |

@sjha4 sjha4 requested review from ehelms and evgeni May 27, 2026 14:32
Implements comprehensive offline backup functionality for Foreman deployments:
- Backs up all databases (foreman, candlepin, pulp, 5 IOP DBs)
- Backs up podman secrets, networks, volumes, quadlet files
- Backs up systemd units and foremanctl state
- Includes metadata with container image digests for restore compatibility
- Preflight checks for running tasks and database integrity (amcheck)
- Automatic service restoration on failure

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@sjha4 sjha4 force-pushed the backup-offline-implementation branch from e17ecbc to fe3ed91 Compare May 29, 2026 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants