Fix GPU power_draw & power_limit for schema v13 #891
+8
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
General information
In the NVIDIA smi schema v13 the elements containing the power draw & power limit of the GPU have been changed. This PR contains a workaround for this, similar to an already existing workaround in the nvidia_smi plugin.
Currently the workaround is implemented using the same logic of an already existing workaround (where
power_readingswas renamed togpu_power_readings, see f99722a & #681). Just adding on to the pile doesn't feel as elegant, but it works. While doing some research I came across Telegraf, which also supports parsing the XML from NVIDIA smi. They fetch the schema version from the DOCTYPE and then use a parser for that specific version (https://github.com/influxdata/telegraf/blob/master/plugins/inputs/nvidia_smi/nvidia_smi.go#L112). Depending on how these schemas evolve in the future, this may be a more elegant solution.Bug reports
After upgrading the NVIDIA driver on one of my machines from 530 to 580 I noticed that the power monitoring in CheckMK had less information available. The 'Power Draw' figure disappeared from the host services overview, as did the graph in the service detail view and the power limit reported
None.The nvidia-smi man page mentions the following:
Proposed changes
The power draw and power limit are available in the web interface
The aforementioned values are not available in the web interface