Skip to content

Conversation

@Johan-Liebert1
Copy link
Collaborator

@Johan-Liebert1 Johan-Liebert1 commented Oct 13, 2025

Add a command to delete a composefs native deployment

Deleting a deployment would mean, deleting the EROFS image, the
bootloader entries for that deployment and deleting any objects in the
composefs repository that are only referenced by said deployment.

Also refactor some functions and add error contexts in some places

Draft PR for now as it requires some more adjustments

Opt::ConfigDiff => get_etc_diff().await,

#[cfg(feature = "composefs-backend")]
Opt::DeleteDeployment { depl_id, delete } => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this is also an RFE here #1276

pub bootloader: Bootloader,
/// The sha256sum of vmlinuz + initrd
/// Only `Some` for Type1 boot entries
pub boot_digest: Option<String>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of String let's use say https://docs.rs/oci-spec/latest/oci_spec/image/struct.Sha256Digest.html for stuff like this?

Though that might be a bit harder with the jsonschema

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's a good idea

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the jsonschema it's a bit tricky

let Some(depl_to_del) = depl_to_del else {
anyhow::bail!("Deployment {deployment_id} not found");
};

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a missing check here that refuses to delete the booted one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's at https://github.com/bootc-dev/bootc/pull/1685/files/883110d50b9245951ba693b87aad02d132c73e8f#diff-12f8773f52c03d119c8cc2cb67374f160f918ff0baae4bd10306f1ce84120dd6R326

I have a delete param passed in to this function for debugging purposes. Might be helpful to just see what all will be deleted

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the param, as it doesn't really make much sense

Comment on lines 204 to 205
grub_dir
.atomic_write(USER_CFG_STAGED, buffer)
.with_context(|| format!("Writing to {USER_CFG_STAGED}"))?;

rustix::fs::fsync(grub_dir.reopen_as_ownedfd().context("Reopening")?).context("fsync")?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make an atomic_write_fsync helper

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also btw there's https://docs.rs/cap-std-ext/latest/cap_std_ext/dirext/trait.CapStdExtDirExt.html#tymethod.atomic_replace_with that supports writing-as-you-go instead of building up the in memory buffer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use atomic_replace_with

We could make an atomic_write_fsync helper

looking at it again, we don't have any one show write + fsync. It's mostly a couple of writes then a final fsync

.remove_dir_all(&state_dir)
.with_context(|| format!("Removing dir {state_dir:?}"))?;

for sha in diff {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really be an api in composefs-rs right?

Also bigger picture...I think we want to very clearly separate two things (as libostree does): managing the "GC roots" vs a GC operation.

Deleting a deployment starts with unlinking its GC root: the bootloader entry.

But thereafter we should just invoke a generic "GC" operation which traverses all active roots.

The idea here is we must support being interrupted - and we want to be idempotent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. There are a few failure cases that need to be handled

This should really be an api in composefs-rs right?

The deletion of unreferenced objects? It does make sense for it to be in composefs-rs. There's even a command for it, but right now it's basically a no-op.

Add a command to delete a composefs native deployment

Deleting a deployment would mean, deleting the EROFS image, the
bootloader entries for that deployment and deleting any objects in the
composefs repository that are only referenced by said deployment.

Also refactor some functions and add error contexts in some places

Signed-off-by: Pragyan Poudyal <[email protected]>

composefs-backend: Deleting staged deployment

Signed-off-by: Pragyan Poudyal <[email protected]>
Delete the boot entries first, the image second and everything else
afterwards. If we fail to delete the boot entry, then there's no point
in deleting the image as the boot entry will still show, but there will
be no image.

We delete the objects at the end, as when we later perform a gc
operation and don't find the image that references these objects, we can
remove them then.

The state directory shouldn't have any effect on boot if the image
associated to it doesn't exist.

If the staged file /run/composefs/staged-deployment does exist, but we
have already deleted the staged image, the finalize service would fail
but that wouldn't break anything

Signed-off-by: Pragyan Poudyal <[email protected]>
@github-actions github-actions bot added the area/install Issues related to `bootc install` label Oct 28, 2025
Update the deletion of deployment to only simply delete the bootloader
entries related to the deployment and then call a `gc` function, which
will just get the difference between the states represented by the
bootloader entries and the repository then try to reconcile everything
by performing GC operation on the repository.

Signed-off-by: Pragyan Poudyal <[email protected]>
Add debug logs for whatever is being deleted
Remove the `delete` param from `delete_deployment` function
Use `atomic_replace_with` instead of writing to a buffer then writing

Signed-off-by: Pragyan Poudyal <[email protected]>
Signed-off-by: Pragyan Poudyal <[email protected]>
@Johan-Liebert1 Johan-Liebert1 marked this pull request as ready for review October 28, 2025 11:39
@bootc-bot bootc-bot bot requested a review from gursewak1997 October 28, 2025 11:39
@Johan-Liebert1
Copy link
Collaborator Author

Had some discussions with @allisonkarlitskaya on Matrix on how we'd want to handle gc for a composefs repo. The gc_objects function should not be in bootc and will be moved to composefs-rs

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally OK, nothing blocking.

""
};

tracing::info!("Deleting {kind}deployment '{deployment_id}'");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor but let's use structured logging for this instead like tracing::info!("Deleting deployment: {deployment_id}", kind = kind) or so


delete_depl_boot_entries(&depl_to_del, deleting_staged)?;

composefs_gc().await?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the GC should be a clearly separate step in general so we can amortize its cost (e.g. "delete 3 deployments, then gc") and also skip the gc ("lazy prune storage later")

#[fn_error_context::context("Listing EROFS images")]
fn list_erofs_images(sysroot: &Dir) -> Result<Vec<String>> {
let images_dir = sysroot
.open_dir("composefs/images")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm there isn't an API for this in Repository?

I think what would help here is if this API actually took a Repository instance for now at least - we can reference its fd and read things directly. Same elsewehere

@cgwalters cgwalters enabled auto-merge (rebase) October 28, 2025 13:12
@cgwalters
Copy link
Collaborator

Hmm it looks like we had a timeout during install in this run only for c9s. https://github.com/bootc-dev/bootc/actions/runs/18873507377/job/53857419924?pr=1685

@cgwalters cgwalters merged commit 5daa432 into bootc-dev:main Oct 28, 2025
46 of 48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/install Issues related to `bootc install`

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants