Skip to content

Conversation

@paveliak
Copy link
Collaborator

Two major changes in this PR:

  1. Image building workflow produces a VM with Root FS protected with DM-Verity
  2. Verity tree is stores in a separate partition (file devices are not longer used)

TODO (for the future PRs):

  • Add attestation
  • Migration from GRUB to systemd-boot for the measured boot may be needed

findmnt

TARGET                                        SOURCE                 FSTYPE      OPTIONS
/                                             overlay                overlay     rw,relatime,lowerdir=/roroot,upperdir=/mnt/overlay/upper,workdir=/mnt/overlay/work,uuid=on,nouserxattr
├─/sys                                        sysfs                  sysfs       rw,nosuid,nodev,noexec,relatime
│ ├─/sys/firmware/efi/efivars                 efivarfs               efivarfs    rw,nosuid,nodev,noexec,relatime
│ ├─/sys/kernel/security                      securityfs             securityfs  rw,nosuid,nodev,noexec,relatime
│ ├─/sys/fs/cgroup                            cgroup2                cgroup2     rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot
│ ├─/sys/fs/pstore                            pstore                 pstore      rw,nosuid,nodev,noexec,relatime
│ ├─/sys/fs/bpf                               bpf                    bpf         rw,nosuid,nodev,noexec,relatime,mode=700
│ ├─/sys/kernel/debug                         debugfs                debugfs     rw,nosuid,nodev,noexec,relatime
│ ├─/sys/kernel/tracing                       tracefs                tracefs     rw,nosuid,nodev,noexec,relatime
│ ├─/sys/fs/fuse/connections                  fusectl                fusectl     rw,nosuid,nodev,noexec,relatime
│ └─/sys/kernel/config                        configfs               configfs    rw,nosuid,nodev,noexec,relatime
├─/proc                                       proc                   proc        rw,nosuid,nodev,noexec,relatime
│ └─/proc/sys/fs/binfmt_misc                  systemd-1              autofs      rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=2859
│   └─/proc/sys/fs/binfmt_misc                binfmt_misc            binfmt_misc rw,nosuid,nodev,noexec,relatime
├─/dev                                        udev                   devtmpfs    rw,nosuid,relatime,size=8141804k,nr_inodes=2035451,mode=755,inode64
│ ├─/dev/pts                                  devpts                 devpts      rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000
│ ├─/dev/shm                                  tmpfs                  tmpfs       rw,nosuid,nodev,inode64
│ ├─/dev/hugepages                            hugetlbfs              hugetlbfs   rw,relatime,pagesize=2M
│ └─/dev/mqueue                               mqueue                 mqueue      rw,nosuid,nodev,noexec,relatime
├─/run                                        tmpfs                  tmpfs       rw,nosuid,nodev,noexec,relatime,size=1632692k,mode=755,inode64
│ ├─/run/lock                                 tmpfs                  tmpfs       rw,nosuid,nodev,noexec,relatime,size=5120k,inode64
│ ├─/run/credentials/systemd-sysusers.service ramfs                  ramfs       ro,nosuid,nodev,noexec,relatime,mode=700
│ ├─/run/user/1000                            tmpfs                  tmpfs       rw,nosuid,nodev,relatime,size=1632688k,nr_inodes=408172,mode=700,uid=1000,gid=1000,inode64
│ └─/run/snapd/ns                             tmpfs[/snapd/ns]       tmpfs       rw,nosuid,nodev,noexec,relatime,size=1632692k,mode=755,inode64
│   └─/run/snapd/ns/lxd.mnt                   nsfs[mnt:[4026532203]] nsfs        rw
├─/snap/lxd/31333                             /dev/loop1             squashfs    ro,nodev,relatime,errors=continue,threads=single
├─/snap/core20/2599                           /dev/loop0             squashfs    ro,nodev,relatime,errors=continue,threads=single
├─/snap/snapd/24792                           /dev/loop2             squashfs    ro,nodev,relatime,errors=continue,threads=single
├─/boot/efi                                   /dev/sda15             vfat        rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro
└─/mnt                                        /dev/sdb1              ext4        rw,relatime

lsblk

NAME            MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINTS
loop0             7:0    0 63.8M  1 loop  /snap/core20/2599
loop1             7:1    0 89.4M  1 loop  /snap/lxd/31333
loop2             7:2    0 49.3M  1 loop  /snap/snapd/24792
sda               8:0    0   33G  0 disk  
├─sda1            8:1    0 29.9G  0 part  
│ └─slsa-verity 252:0    0 29.9G  1 crypt 
├─sda2            8:2    0    3G  0 part  
│ └─slsa-verity 252:0    0 29.9G  1 crypt 
├─sda14           8:14   0    4M  0 part  
└─sda15           8:15   0  106M  0 part  /boot/efi
sdb               8:16   0  150G  0 disk  
└─sdb1            8:17   0  150G  0 part  /mnt

Signed-off-by: Pavel Iakovenko <[email protected]>
Signed-off-by: Pavel Iakovenko <[email protected]>
Signed-off-by: Pavel Iakovenko <[email protected]>
Signed-off-by: Pavel Iakovenko <[email protected]>
Signed-off-by: Pavel Iakovenko <[email protected]>
Signed-off-by: Pavel Iakovenko <[email protected]>
Signed-off-by: Pavel Iakovenko <[email protected]>
Signed-off-by: Pavel Iakovenko <[email protected]>
Signed-off-by: Pavel Iakovenko <[email protected]>
Added a command to list files in the /tmp directory.

Signed-off-by: Pavel Iakovenko <[email protected]>
Added commands to create a Shared Image Gallery and image definition in Azure.

Signed-off-by: Pavel Iakovenko <[email protected]>
Signed-off-by: Pavel Iakovenko <[email protected]>
Signed-off-by: Pavel Iakovenko <[email protected]>
Copy link
Collaborator

@marcelamelara marcelamelara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @paveliak ! I haven't tested it yet, but overall I think this looks good! I have a few outstanding questions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between top-verity and bottom-verity?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

top-verity script opens Verity device, which effectively creates /dev/mapper/<DEVICE_NAME>. Given that it is placed in the local-top directory it is executed before root fs is mounted.

bottom-verity script creates OverlayFS using the previously created Verity device (in top-verity) for the bottom part of the OverlayFS and local temp disk of the Azure VM (ephemeral drive that is cleaned on every reboot) for the top part of the Overlay. Then it re-mounts Overlay FS as a rw root file system.

Naming is not great because top/bottom wording is used both in initramfs and OverlayFS and the funniest part is that bottom part of the OverlayFS is created in the top-verity script 😁

I am not sure why we have this split. Technically we should be able to do everything in the bottom-verity


Image has 3 notable partitions - `boot`, `root file system` and `verity tree`. Verity tree contains hashes for the root file system. Verity configuration data (e.g., root hash) is passed in a well-known configuration file within the boot partition. This file is processed by `initrd` to properly initialize (i.e. open) Verity device. Root hash is measured into TPM and hence is present in the remote attestation quote.

Initrd sets up OverlayFS for the root file system using local ephemeral disk as a storage device. Build environments are ephemeral at `Build L3` and intermediate data is not expected to be preserved upon the termination of the environment. To achieve `BuildEnv L3` temporary storage must be encrypted, which could be done with an ephemeral key generated in Initrd upon booting the environment. `BuildEnv L2` does not require encryption.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to enforce in the demo that the storage must be encrypted at L3? I thought we took that out of the specification.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to be encrypted at L3. Because otherwise host-based malware may tamper with the temporary data (and all build artifacts will be there). I do not think it's a big deal to encrypt temporary drive because it's ephemeral and so there is no need in managing keys. And confidential VM will make sure that keys in memory are protected from the malicious host.


This Demo uses 2 VMs - `ImageVM` and `HasherVM`.
- `ImageVM` is used to build the target image, it could be populated with all the tools and data needed
- `HasherVM` is a "worker" VM whose sole purpose is to compute Verity tree over the ImageVM root FS.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me think that the HasherVM is "more trusted" than the ImageVM. We may want to note a basic trust model for these VMs here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they are about the same from the trust perspective. HasherVM does not validate the data it hashes. But it has abilities to modify the data produced by the ImageVM. A de-risking approach here could be to put verity hashes on a separate disk and then we could attach data disk as a read-only divice to the HasherVM.

But that seems to unnecessarily complicate things from the distribution perspective because now image provider would have to distribute two images.

I will mention trust relationships in the doc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added text about trusting VMs.

@marcelamelara the challenging part is L3. If BuildProvider has BuildEnv L3 level then it should be running ImageVM and HasherVM as confidential VMs encrypting the disk. However, the image itself should not be encrypted. And then the question is how do we generate Verity hashes for the encrypted disk like if it was not encrypted?

One solution that comes into my mind is that HasherVM:

  1. creates a non-encrypted block file on the encrypted disk
  2. does block copy of all the ImageVM data into the block file
  3. computes verity hashes for the block file
  4. creates azure disk (or whatever cloud object) from the image

In other words we need to operate with a logical disk on the encrypted physical disk.

paveliak and others added 3 commits October 13, 2025 15:01
Co-authored-by: Marcela Melara <[email protected]>
Signed-off-by: Pavel Iakovenko <[email protected]>
Removed unnecessary command to list files in /tmp.

Signed-off-by: Pavel Iakovenko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants