Skip to content

Conversation

@elupus
Copy link
Contributor

@elupus elupus commented Jul 26, 2024

  • Read out the supported and unsupported attribute of a device at configure to be able to only list available attributes in GUI.
  • Don't swallow exceptions during configure of clusters in debug logs.

f"Failed to write attribute {name}={value}: {record.status}",
)

async def _discover_attributes_all(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wan't sure if this should go here in ZHA or in ZIGPY. The cache of unsupported attributes does exist in ZIGPY.


async def discover_unsupported_attributes(self):
"""Discover the list of unsupported attributes from the device."""
attribute_info = await self._discover_attributes_all()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also i'm not fully sure what type of errors devices might respond with here. We may want to gracefully handle some things. For example, the GeneralCommand.Default_Response response is sent by some quirks here which seem utterly wrong.

await ch_specific_cfg()

self.debug("Discovering available attributes")
await self.discover_unsupported_attributes()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the name of the method and the comment as written are opposites (available vs unsupported)

for cluster_handler, outcome in zip(cluster_handlers, results):
if isinstance(outcome, Exception):
cluster_handler.debug(
cluster_handler.warning(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated change? assume this was for quick visual in log during development?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes unrelated. But valid. Should not hide full exceptions of general type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the reconfigure dialog we show the results of the calls with success / failure. I need to update the joining UI to do something similar. It has the capability to flag failures w/ color and messaging it just isn't working atm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheJulianJES we should never generically hide base Exception types. That hides programming errors (as i found out the hard way). Error in programming caused an attribute error that was hard to track down due to this.

Its fine to swallow know errors if we know their cause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ps. I move this out separate.

@puddly
Copy link
Contributor

puddly commented Jul 26, 2024

I believe this is already handled by zigpy, no? When we read an attribute, it is recorded as unsupported if the device responds as such (https://github.com/zigpy/zigpy/blob/37f4ee65ec19f7b8af318d4ed2dc5325029020d9/zigpy/zcl/__init__.py#L553-L554). ZHA cluster handlers list out the attributes to be read and react appropriately.

Or do you mean for the frontend's cluster configuration interface? If so, this change would probably need to be moved into a special debug mode, as every extra attribute read increases the chances of initialization failing, especially for battery-powered devices.

@TheJulianJES
Copy link
Contributor

TheJulianJES commented Jul 26, 2024

Some devices, like Hue motion sensors, can report an attribute as "unsupported" during and shortly after pairing. This affects the occupancy timeout attribute they use, for example. It should not be hidden.

The clusters menu is also a debug UI. Personally, I wouldn't hide any attributes. They can also become available after a firmware update. Although we could add a symbol or change the color depending on if an attribute is known available or unsupported.

Reading out all attributes on a cluster during pairing also seems like it could cause issues with battery-powered devices that go to sleep fast. We could miss the endpoints/attributes we actually care about.
EDIT: I guess this is improved by using the discover attribute command instead of reading them all? I'm not sure if it works correctly for all devices though.

Furthermore, quirks can define new attributes after a device is already paired. We don't reinterview the device then, so whether the added attribute is shown or not might be inconsistent depending on when a device was paired.

Lastly, many devices fail on cluster configuration, as they simply don't support some things. That's why it's a debug message at the moment. I'd keep it that way.

@dmulcahey
Copy link
Contributor

dmulcahey commented Jul 26, 2024

I believe this is already handled by zigpy, no? When we read an attribute, it is recorded as unsupported if the device responds as such (https://github.com/zigpy/zigpy/blob/37f4ee65ec19f7b8af318d4ed2dc5325029020d9/zigpy/zcl/__init__.py#L553-L554). ZHA cluster handlers list out the attributes to be read and react appropriately.

Or do you mean for the frontend's cluster configuration interface? If so, this change would probably need to be moved into a special debug mode, as every extra attribute read increases the chances of initialization failing, especially for battery-powered devices.

would this be a good use for the scan device functionality? we could schedule it post join and attempt it n many times until it succeeds or we give up and make it manually triggered in the UI?

@elupus
Copy link
Contributor Author

elupus commented Jul 26, 2024

Yes its mostly for frontend. But i dont see why we read each attribute to find out if its supported instead of asking the device what is supported.

@TheJulianJES
Copy link
Contributor

TheJulianJES commented Jul 26, 2024

But i dont see why we read each attribute to find out if its supported instead of asking the device what is supported.

It would probably require some extra support for quirked devices with fake ZCL attributes. The device doesn't know we show them to zigpy/zha.
Zha-quirks tries to be "invisible" for the most part, but we can't really (and shouldn't IMO) modify the "discover attributes" command for all quirks there, so it would require some special handling in zigpy/zha.

@dmulcahey
Copy link
Contributor

Yes its mostly for frontend. But i dont see why we read each attribute to find out if its supported instead of asking the device what is supported.

I agree with this. We just need to figure out where it makes sense to put it. @puddly's point on device pairing is spot on. Especially for battery powered devices.

@elupus
Copy link
Contributor Author

elupus commented Jul 26, 2024

@TheJulianJES i think it is unlikely HUE would skip report those attributes in the discover call even at start. That list is very likely a static list in the firmware so using this service is better than looking at the response value of the attribute read.

It is also much faster to use this service since it will report many services in one call compared to trying to read them.

We could wrap the discover call in zigpy to take quirked attributes into account to force them to be included in the list.

@puddly i could not find a place where we iterate over ALL attributes in the clusters to figure out what the device support. Doing that would also be really bad due to the number of requests.

@elupus
Copy link
Contributor Author

elupus commented Jul 30, 2024

Still some issues with tests. Some tests seem to be using same mock on multiple clusters. Either that or we are configuring same cluster multiple times for some reason.

@dmulcahey
Copy link
Contributor

https://github.com/mdeweerd/zha-toolkit/blob/main/custom_components/zha_toolkit/scan_device.py

We should implement something like this to handle this functionality in ZHA. We can potentially implement this either before or after the device has completed initial configuration and initialization. I'd suggest running it in a background task here if we want it before config / init:


or we can implement it here in a chained task if we think after makes more sense:
self._device_init_tasks[device.ieee] = init_task = self.async_create_task(
maybe we can make the initial task set an event that allows the scan task to run next after initi is done if we want to do it post config / init.

we also need to differentiate between a new join of a new device and a device just returning to the network (zigbee devices can leave and rejoin the network at will). The way this is currently being implemented here will affect new joins and rejoins. On rejoins there is no need to do this again. We also will need to figure out how to handle rescans post firmware upgrades where we should force a complete rediscovery and reconfiguration of the device as signatures (clusters, attrs etc) often change with firmware upgrades.

Changing where this is being implemented will also give us finer grained control over logging so we can avoid the flying blind issue you experienced and we can suppress logs where we deem them not helpful.

@dmulcahey
Copy link
Contributor

I'd suggest running it in a background task here if we want it before config / init:

If we want to do this before the device is fully initialized we may want to move this into Zigpy to avoid race conditions between events. not sure if this quoted suggestion would cause a race or not but it may be possible. Would really need to poke at it to be sure.

@dmulcahey
Copy link
Contributor

dmulcahey commented Jul 30, 2024

Still some issues with tests. Some tests seem to be using same mock on multiple clusters. Either that or we are configuring same cluster multiple times for some reason.

Debugged this... it's failing because @patch it patching request on the cluster class not the instances and as a result devices with identify clusters on multiple endpoints end up w/ more calls in the list.

Context from docs:
Additionally, mock provides a patch() decorator that handles patching module and class level attributes within the scope of a test, along with sentinel for creating unique objects. See the quick guide for some examples of how to use Mock, MagicMock and patch().

image

With this change disco tests pass:

image

Anyways, I think we need to discuss where this functionality belongs as outlined above.

@elupus
Copy link
Contributor Author

elupus commented Jul 30, 2024

So i've moved this to async_initialize, as far as i can tell that is only done when we initially join. I have some tests i need to figure out. I don't think we need to chain it to any other task?

@dmulcahey
Copy link
Contributor

So i've moved this to async_initialize, as far as i can tell that is only done when we initially join. I have some tests i need to figure out. I don't think we need to chain it to any other task?

Initialize is called at startup, integration reload, and device pairing. For example: https://github.com/elupus/zha/blob/744a031a47694c426b78bd467b63b373b3066a0b/zha/application/gateway.py#L363

I think we want this completely separate from device joining because this can fail and be run in the background until it works without impacting device functionality.

@elupus
Copy link
Contributor Author

elupus commented Jul 30, 2024

So i've moved this to async_initialize, as far as i can tell that is only done when we initially join. I have some tests i need to figure out. I don't think we need to chain it to any other task?

Initialize is called at startup, integration reload, and device pairing. For example: https://github.com/elupus/zha/blob/744a031a47694c426b78bd467b63b373b3066a0b/zha/application/gateway.py#L363

I think we want this completely separate from device joining because this can fail and be run in the background until it works without impacting device functionality.

In theory it can effect function since some entities depend on which attributes are marked as supported (ie inverse of unsupported). I dont think it should be background..its a fast and quick thing as soon as you have connectivity.

During pairing we should be in quick poll (are we?) on the devices so this should not have any major effect on startup.

@codecov
Copy link

codecov bot commented Jul 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.66%. Comparing base (dfe5ae7) to head (c16d356).

Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #101      +/-   ##
==========================================
+ Coverage   95.65%   95.66%   +0.01%     
==========================================
  Files          61       61              
  Lines        9255     9283      +28     
==========================================
+ Hits         8853     8881      +28     
  Misses        402      402              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@puddly
Copy link
Contributor

puddly commented Aug 1, 2024

During pairing we should be in quick poll (are we?) on the devices so this should not have any major effect on startup.

Fast polling can only be enabled for devices that have a Poll Control cluster. I don't remember off the top of my head how common this is but it's definitely only something that started appearing in newer Zigbee 3.0 devices. Many devices don't support it and once joined, they enter long polling mode and are difficult to reach.

We definitely should enable fast polling for a minute or two for any joining devices that support the cluster (and potentially every time we want to communicate with the device?). This needs to be done within zigpy, as it needs to happen as early as possible (possibly before this).

This sort of requires a communication overhaul for zigpy: we need a more reliable way to actually communicate with sleepy end devices. Right now, we rely on radio retries and it is not reliable for devices that poll infrequently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants