Skip to content

Conversation

@kperakis-zededa
Copy link

@kperakis-zededa kperakis-zededa commented Nov 24, 2025

This PR extends ZInfoHardware and related messages to support the hardware inventory work for 4 steps to Nirvana. It adds structured PCI/USB/serial/network/CAN, BIOS, CPU/NUMA fields and basic node-level summary (memory, storage, CPUs, kernel, platform, TPM/watchdog/LED presence).

@christoph-zededa
Copy link
Contributor

Can you squash the commits? F.e. I don't think we need [move inventory into hardwareinfo](https://github.com/lf-edge/eve-api/pull/127/commits/983017d102655f967b4e682b6282717c10b12d96) in the final merge.

@kperakis-zededa
Copy link
Author

Can you squash the commits? F.e. I don't think we need [move inventory into hardwareinfo](https://github.com/lf-edge/eve-api/pull/127/commits/983017d102655f967b4e682b6282717c10b12d96) in the final merge.

Sure i will update it.


string kernel_flavor = 17; // e.g. "pc", "rt", "virt"
string kernel_version = 18; // e.g. "6.6.45-eve"
string eve_platform = 19; // board/platform name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rucoder will this platform string be sufficient to identify that it is an evaulation image and which partition of the evaulation image?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after the discussion with @rucoder and @christoph-zededa we will add a field for the partition and the kernel cmdline!

uint32 cpu_count = 13; // logical CPU count

bool watchdog_present = 14;
bool tpm_present = 15;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the golang library make the hardware and firmware versions of the TPM device available?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eriknordmark Yes, the google/go-tpm/tpm2 library that we use on EVE, can read both hardware and firmware version info from the TPM. You can use the GetCapability command with TPMCapProperty and the relevant TPM_PT_* constants, such as TPM_PT_FIRMWARE_VERSION_1/2 for firmware and TPM_PT_MANUFACTURER / TPM_PT_VENDOR_STRING_* for vendor and other hardware properties. I will update the PR accordingly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will add a string for TPM version, if it can be recognized by the API.

@kperakis-zededa kperakis-zededa force-pushed the pr-119-usb-info branch 2 times, most recently from b8c7a79 to 6e75bdd Compare November 25, 2025 09:06
@kperakis-zededa
Copy link
Author

Can you squash the commits? F.e. I don't think we need [move inventory into hardwareinfo](https://github.com/lf-edge/eve-api/pull/127/commits/983017d102655f967b4e682b6282717c10b12d96) in the final merge.

@christoph-zededa i squashed two of your commits, please check again and tell me if you want me to do anything else.

string vendor = 2; // e.g. "GenuineIntel"
uint32 id = 3; // logical CPU id
uint64 freq = 4; // nominal frequency
repeated CPUCache caches = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caches can be shared between processor cores. IMO it makes therefore more sense to have a list of CPUCache and have it a member repeated cpu_id.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right that caches can be shared between cores, but this proto "CPU struct" represents logical CPUs, not core groups. For now we keep Cache per CPU, and if we ever need to model shared cache accross different CPUs we can extend NUMA with a shared-cache structure later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we ever need to model shared cache accross different CPUs we can extend NUMA with a shared-cache structure later

I think it is likely that we need to add it, as it is advantageous when using CPU pinning.
If we extend NUMA with a shared-cache structure, then we have the cache in the CPU message and in the NUMA message and it does not look nice when information is scattered or duplicated over two messages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this structure allow you to specify that core 1 and core 2 share an L2 cache and core 3 and core 4 share another L2 cache, and all four share an L3 cache? To specify that don't you need to be able to name the two shared L2 caches so that you can tell it apart from core 1 and core 3 sharing an L2 cache (and 2 + 4 sharing the other).

This cache hiearchy seems premature and can be added later.
And NUMA is more that cache sharing - NUMA also needs to specify the placement of the memory controllers and which memory hangs off of each memory controller.
When we get to add that we should look at what the various generic + intel/amd/arm-specific tools report in terms of memory, controllers, caches, and cores.
No need to spend time on getting this correct now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion would be to do it similar to the ghw library.
This is part of the output of ghwc topology -f json | jq .topology.nodes[0].caches on my laptop:

           {
    "level": 2,
    "type": "unified",
    "size_bytes": 1310720,
    "logical_processors": [
      6,
      7
    ]
  },
  {
    "level": 2,
    "type": "unified",
    "size_bytes": 2097152,
    "logical_processors": [
      8,
      9,
      10,
      11
    ]
  },
  {
    "level": 2,
    "type": "unified",
    "size_bytes": 2097152,
    "logical_processors": [
      12,
      13,
      14,
      15
    ]
  },
  {
    "level": 3,
    "type": "unified",
    "size_bytes": 18874368,
    "logical_processors": [
      0,
      1,
      2,
      3,
      4,
      5,
      6,
      7,
      8,
      9,
      10,
      11,
      12,
      13,
      14,
      15
    ]
  }

The NUMA node has a list of caches and each cache has a list of cpu cores.

Yes, no need to make it correct now, but we should not put obstacles on the way there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but for NUMA we need something more than just caches to be able to handle the AMD interconnect between cores/chiplets.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eriknordmark @christoph-zededa Do we agree to remove caches for the time beeing and rethink about it in the future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is okay to do that later as we know have an idea how it can be extended.

The controller does not use this information yet, so we're not in an immediate hurry.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the conversation we had,we decided to leave it as is for the time beeing and rethink about that in the future!

@christoph-zededa
Copy link
Contributor

Can you squash the commits? F.e. I don't think we need [move inventory into hardwareinfo](https://github.com/lf-edge/eve-api/pull/127/commits/983017d102655f967b4e682b6282717c10b12d96) in the final merge.

@christoph-zededa i squashed two of your commits, please check again and tell me if you want me to do anything else.

One commit adds proto/hardwareinventory/msg.proto and another removes it again, so these commit can be squashed as well.

christoph-zededa and others added 2 commits November 25, 2025 13:02
add inventory message for EVE to upload to controller

Signed-off-by: Christoph Ostarek <[email protected]>
- Extend ZInfoHardware with node summary, NUMA and BIOS attributes

- Add node-level summary and device inventories to ZInfoHardware

- Refine hardware inventory proto with CPU/NUMA, BIOS and misc attrs

- Update hardware inventory API with summary fields and misc attributes

Signed-off-by: Konstantinos Perakis <[email protected]>
@christoph-zededa
Copy link
Contributor

Another thing: Can you change the PR title?

@kperakis-zededa kperakis-zededa changed the title Pr 119 usb info Harware Inventory EVE API Messages Nov 25, 2025
@eriknordmark
Copy link
Contributor

Can you squash the commits? F.e. I don't think we need [move inventory into hardwareinfo](https://github.com/lf-edge/eve-api/pull/127/commits/983017d102655f967b4e682b6282717c10b12d96) in the final merge.

@christoph-zededa i squashed two of your commits, please check again and tell me if you want me to do anything else.

One commit adds proto/hardwareinventory/msg.proto and another removes it again, so these commit can be squashed as well.

In general (and to make it easier to merge with any other PRs) the best is to put only and all the proto file changes in one commit and the derived files in a separate commit. (If there is a conflict you can then discard the second commit and regenerate it.)

repeated SmartAttr smart_attr = 8; // smartctl output
}

message USBBusDevnum {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would seem more consistent with the PCI case to name this message USBAddress.

Comment on lines 42 to 43
optional PCIAddress pci_parent = 1;
optional USBBusDevnum usb_parent = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we ever have a different type of parent we have to modify this.
If you add a layer of indirection (e.g., called BusParent) which contains
optional PCIAddress pci_parent = 1;
optional USBBusDevnum usb_parent = 2;
we can later add different types of parents without changing the children.

// https://elixir.bootlin.com/linux/v6.16.9/source/include/linux/usb.h#L984
uint32 vendor_id = 3;
uint32 product_id = 4;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

USB devices also have a class which would be useful to report.
Also, lsusb -t shows the driver which would be helpful to report (to see if drivers are missing).

optional PCIAddress pci_parent = 1;
optional USBBusDevnum usb_parent = 2;

string ioports = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the plural "ports" imply that this is a list? If so, how are the different items separated (comma? space?).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eriknordmark Good point, the name is confusing. This field is a single I/O port range string as reported by the kernel (for example "03f8-03ff" for a serial port), not a list of multiple values.

I can rename it to ioport_range (or similar) and update the comment to make the format explicit. Would that work better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a rename and comment would help.

optional USBBusDevnum usb_parent = 2;

string ifname = 3;
bool wireless = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we currently separate WiFi and Cellular? (And then there is BT as well which spec.sh doesn't look for AFAIK.)

Copy link
Author

@kperakis-zededa kperakis-zededa Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eriknordmark i agree, this is what is described in the document.

I suggest the following change:

enum NetworkDeviceType {
        NETWORK_DEVICE_TYPE_UNKNOWN   = 0;
        NETWORK_DEVICE_TYPE_ETHERNET  = 1;
        NETWORK_DEVICE_TYPE_WIFI      = 2;
        NETWORK_DEVICE_TYPE_CELLULAR  = 3;
        NETWORK_DEVICE_TYPE_BT        = 4;
}

message NetworkDevice {
        BusParent bus_parent = 1;

        string ifname = 3;
        NetworkDeviceType type = 4;

        string mac_address = 5; // e.g. "00:11:22:33:44:AA"
        uint32 speed_mbps = 6;  // link speed if known
}

To add enum with some known device types. What do you think?

Copy link
Contributor

@eriknordmark eriknordmark Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have some existing definitions for the config side of the API but these are intermixed with other I/O adapters:
proto/evecommon/devmodelcommon.proto: PhyIoNetEth = 1;
proto/evecommon/devmodelcommon.proto: PhyIoNetWLAN = 5;
proto/evecommon/devmodelcommon.proto: PhyIoNetWWAN = 6;
proto/evecommon/devmodelcommon.proto: PhyIoNetEthPF = 11;
proto/evecommon/devmodelcommon.proto: PhyIoNetEthVF = 12;

Note that we haven't defined anything for BT yet.

How about renaming NETWORK_DEVICE_TYPE_CELLULAR to NETWORK_DEVICE_TYPE_WWAN and leaving out BT for now?

repeated CPUCache caches = 5;
}

message NUMA {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? Isn't it just a list of the CPUs? I see no memory-specific info in here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eriknordmark You’re right, with the current definition NUMA is effectively just a list of CPUs. I followed the plan in the hardware inventory document: https://docs.google.com/document/d/1AqeDQNHneVpCvtnBDU3gIzwKI6BA_S_Bdq3MKuPW2vQ/edit?tab=t.0,

where the NUMA message only defines the CPUs, and the memory is exposed at the node level via ZInfoHardware.total_memory_bytes. As I understand it, the CPUs in a given NUMA node share the same memory.

Do you think we should also include memory information inside the NUMA message or keep it as is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is confusing to call the list of CPUs "NUMA" so I would argue to just rename it (to CPUS or CPU_list). But check with @christoph-zededa and others in the team.
Later we can add NUMA information to capture the CPU to memory relationships.

uint32 cpu_count = 12; // logical CPU count

bool watchdog_present = 13;
bool tpm_present = 14;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec.sh reports only a boolean. But it probably makes sense making this a message/struct so that we can also report manufacturer and firmware versions. @shjala can provide details of what is useful, but we can fill in more details later if you define a optional message for now.

}

// Information about the system that is sent once when the system boots
message ZInfoHardware {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure of this message doesn't make it clear what we consider being the hardware inventory. And a consumer probably wants to be able to compare the inventory without being confused by e.g., the eve_release string. That string is very useful, but it is not part of the hardware inventory.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eriknordmark according to the plan, we should include the hardware inventory in the existing ZInfoHardware message that we already send to the controller when the system boots.

If we rename it, we’ll likely break compatibility with previous versions. Do you agree that we should instead add a HardwareInventory message under ZInfoHardware?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, some HardwareInventory inside it would make it clear what is inventory and what is not. Currently I think there is only the smart and the eve release which should be outside. I don't know if @rucoder will want some additional field for the evaluation partition names to track them. But if he doesn't respond we can worry about that later.

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please update the https://github.com/lf-edge/eve-api/blob/main/APIv2.md#hardware-health-info as a result of the move of the smart counters, and add a bullet item for hardware info in https://github.com/lf-edge/eve-api/blob/main/APIv2.md#info (it appears to not have been added before).

@kperakis-zededa
Copy link
Author

Also, please update the https://github.com/lf-edge/eve-api/blob/main/APIv2.md#hardware-health-info as a result of the move of the smart counters, and add a bullet item for hardware info in https://github.com/lf-edge/eve-api/blob/main/APIv2.md#info (it appears to not have been added before).

@eriknordmark sure, lets finalize all the changes and i will update the docs.

@kperakis-zededa kperakis-zededa changed the title Harware Inventory EVE API Messages Hardware Inventory EVE API Messages Dec 3, 2025
@kperakis-zededa kperakis-zededa force-pushed the pr-119-usb-info branch 3 times, most recently from e2ccc2c to 89cb091 Compare December 5, 2025 12:07
@kperakis-zededa
Copy link
Author

@christoph-zededa @europaul @eriknordmark @rucoder can you check the PR again please? I did the changes we discussed yesterday!

Comment on lines +151 to +155
string kernel_flavor = 16; // e.g. "pc", "rt", "virt"
string kernel_version = 17; // e.g. "6.6.45-eve"
string eve_platform = 18; // board/platform name
string partition = 19; // e.g. "IMGA", "IMGB"
string kernel_cmdline = 20; // current kernel command line
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is useful information, but it doesn't change the hardware itself. Thus a change to one of these five shouldn't be an indication that the hardware inventory changed. Thus I think it makes sense to place them one level up in the ZInfoHardware message and not in the inventory message.

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment about moving up some fields to the same place as string eve_release

Also. please put all the proto filechanges in one commit and all the other changes in a separate commit.

Once those are done go ahead and merge. (I'll be offline for a few hours).

Move hardware inventory API in a new hardware.proto file.

Signed-off-by: Konstantinos Perakis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants