Skip to content

Conversation

@TheJulianJES
Copy link
Contributor

Proposed change

This fixes an issue where ZHA doesn't claim the cluster handler for quirks v2 entities that only require the specified attribute to be read during device initialization (pairing), but not bound, causing the cluster handler to not be configured, thus never reading the required attributes, leading to "unknown" state for affected quirks v2 entities after initial pairing.

Two new tests are also added to (1) test that the cluster handler is claimed in the above mentioned scenario, where no binding is configured, and (2) to test that setting up a quirks v2 "command button" entity doesn't claim the cluster handler unnecessarily.

Note

We currently claim cluster handlers for all quirks v2 metadata that contains attribute_name, which I believe basically all do, except for command button. We don't really need to claim a cluster handler + read the attribute for "write attribute buttons" though. We may wanna change this behavior in this PR.

Additional information

For more information on the underlying issues this fixes, see: zigpy/backlog#52

Note on tests

The tests may be a bit misplaced in test_sensor and test_button. They do (need to) utilize quirks v2 entities, but it's really the underlying (quirks v2 device) discovery that is tested.

For now, I'd leave them in-place to minimize the diff, and they do test quirks v2 entities somewhat. In the future, we may wanna move the tests though.

@codecov
Copy link

codecov bot commented Oct 22, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.95%. Comparing base (b172549) to head (bcf7b37).

Additional details and impacted files
@@           Coverage Diff           @@
##              dev     #548   +/-   ##
=======================================
  Coverage   96.94%   96.95%           
=======================================
  Files          63       63           
  Lines       10518    10525    +7     
=======================================
+ Hits        10197    10204    +7     
  Misses        321      321           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@puddly puddly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I've left a few comments.

Long-term, I would like to explore a way to get rid of cluster handlers. Maybe we can have class-based entities (internal to ZHA) provide a nicer way to set up binding and reporting config.

.command_button(
FakeManufacturerCluster.ServerCommandDefs.self_test.name,
OppleCluster.cluster_id,
command_args=(5,),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nitpick:

Suggested change
command_args=(5,),
command_kwargs={"identify_time": 5},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Copied that from some other test, but I'll change that.
We'll probably want to adjust this in a lot of existing tests as well at a later date, including (completely) using kwargs for creating quirks v2 entities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FakeManufacturerCluster.ServerCommandDefs.self_test.name should also be changed to OppleCluster[...]. It's the same command name and we never execute this, since we just need it for testing the quirks v2 metadata processing in the device discovery, but it should still be adjusted.

.add_to_registry()
)

zigpy_device = create_mock_zigpy_device(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nitpick, feel free to ignore if it's too much trouble. Is there any way to test using JSON diagnostics from a real device? I'd like to phase out use of create_mock_zigpy_device and other synthetic testing setups in favor of using real devices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I explicitly used create_mock_zigpy_device, as that's what the similar tests in test_sensor used and especially here, we really want to make sure that we get this exact device with that exact cluster setup.

I feel like we always run into a bit of a risk with using JSON diagnostics in tests, as it's not clear how endpoints and clusters are layed out by just looking at the tests. You have to look at the (huge) JSON file. It's also harder to get a device exactly in the way you want it, without adding unnecessary endpoints/clusters/quirk interference.

Some parts of the device/diagnostics can change over time as well (i.e. ZHA or quirk doing something different, firmware update, ...), when we may not want that for all tests where the diagnostics are used in.

[cluster_handler.name],
)

# if the cluster handler is unclaimed, claim it and set BIND accordingly,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder: should we move this to happen before the yield? That way, the cluster handler is fully set up before the entity initialization happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't really do that in a nice way, since we need to iterate over all quirks v2 entity metadata for a specific cluster handler before we can make the final decision on whether we can set BIND to False.

This is all still done "pre initialization", since they're just added to Device._pending_entities in discovery here:

zha/zha/zigbee/device.py

Lines 939 to 950 in b172549

new_entities = discovery.DEVICE_PROBE.discover_device_entities(self)
# Discover all applicable entities
for entity in new_entities:
if self._is_entity_removed_by_quirk(entity):
continue
# Apply any metadata changes from quirks v2
self._apply_entity_metadata_changes(entity)
entity.on_add()
self._pending_entities.append(entity)

Nothing in entity.on_add() can also ever rely on the cluster handler being claimed or not.
The actual cluster handler initialization is done way later, where it then matters if a cluster handler is claimed or not. I think it's fine to keep it like this tbh.

self._discover_new_entities() does everything we do here. Only afterwards is endpoint.async_initialize called:

zha/zha/zigbee/device.py

Lines 956 to 969 in b172549

self._discover_new_entities()
await self._zdo_handler.async_initialize(from_cache)
self._zdo_handler.debug("'async_initialize' stage succeeded")
# We intentionally do not use `gather` here! This is so that if, for example,
# three `device.async_initialize()`s are spawned, only three concurrent requests
# will ever be in flight at once. Startup concurrency is managed at the device
# level.
for endpoint in self._endpoints.values():
try:
await endpoint.async_initialize(from_cache)
except Exception: # pylint: disable=broad-exception-caught
self.debug("Failed to initialize endpoint", exc_info=True)

# if the cluster handler is unclaimed, claim it and set BIND accordingly,
# so ZHA configures the cluster handler: reporting + reads attributes
if (attribute_initialization_found or reporting_found) and (
cluster_handler not in endpoint.claimed_cluster_handlers.values()
Copy link
Contributor

@puddly puddly Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way for the cluster handler to already be claimed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster handler can already be claimed by entities from EndpointProbe. Since those ZHA entities may rely on the default of BIND = True, I've added the cluster_handler not in endpoint.claimed_cluster_handlers.values() check to make sure we only ever set BIND = False for cluster handlers only claimed for quirks v2 entities.

This is also checked by the tests.

@TheJulianJES
Copy link
Contributor Author

Long-term, I would like to explore a way to get rid of cluster handlers. Maybe we can have class-based entities (internal to ZHA) provide a nicer way to set up binding and reporting config.

Yeah, I think they're not too bad, but they could definitely use some improvement (or yeah, possibly be removed entirely).

For quirks v2, we now automatically handle setting up cluster handlers under the hood (claiming + attribute initialization + reporting), so we can definitely change how we do that, without breaking the quirks v2 API.
As a first step, we might be able to introduce something similar to that for the "ZHA entities", so that relevant attributes are automatically read and configured from reporting, just by defining the entity alone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants