-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Require valid aux driver version #21594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: nirs The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
74d6c94
to
0d8a2c2
Compare
Rename the test after the function it tests, and change to failure message to match the function name.
Make struct fields and helper function names more readable.
We complained that "driver --version" failed but the real command is "driver version".
Previously we failed only if we could not find the driver in the PATH, or running "driver version" failed, assuming that an old driver may work. Hoping that using a invalid driver that does not return a version will work is not a good error handling strategy. Now will will fail loudly helping the user to fix the installation, or failing tests that run with invalid driver.
Previously we had special treatment for driver not found, and drive version failed. Since we treat all errors as fatal errors now, we don't need the special errors. Previously we logged the same errors at least twice; once in validateDriver() and then later in the callers. This create more noise in the log and makes it harder to debug issues. All errors are wrapped now with more context using the modern way (%w) instead of the legacy errors.Wrap(). The context was improved to describe the issue better. Handling of the special errors was also not idiomatic and not thread safe; we modified the global Err* variables at the time of the error. These special variables are removed now.
Rename arguments and temporary variable to make the code more clear.
Add a driverVersion helper and auxdriver.Version type for parsing driver version yaml output. The helper read the output of the "driver version", parses the yaml, and validate that both version and commit are set. This will make it possible to validate the driver commit hash during the tests to ensure we test the driver built from the current code. We log or include in the error message both the driver version and commit hash, to make it easier to debug issues related to using the wrong driver. This changes fixes possible issue if "driver version" command logs errors or warnings. Previously this could break the code parsing the version since we combined stdout and stderr. Now we extract stderr from the command ExitError on errors.
0d8a2c2
to
532dacb
Compare
/ok-to-test |
@nirs: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Testing with function testsStarted functional tests in one shell
Looking at the logs in another shell:
We used:
But the driver was downloaded during the test!
I would expect that with --auto-update-drivers=false the driver will not be downloaded and the tests will fail very early. This seems to be the issue - we install the driver if it does not exist even if autoUpdate is false: if !exists || (err != nil && autoUpdate) {
klog.Warningf("%s: %v", executable, err)
path = filepath.Join(directory, executable)
if err := download.Driver(executable, path, v); err != nil {
return err
}
} The driver must not be installed when autoUpdate is false. |
exit.Error(reason.DrvAuxNotHealthy, "Aux driver"+driverName, err) | ||
} //if failed to update but not a fatal error, log it and continue (old version might still work) | ||
out.WarningT("Unable to update {{.driver}} driver: {{.error}}", out.V{"driver": driverName, "error": err}) | ||
exit.Error(reason.DrvAuxNotHealthy, fmt.Sprintf("Auxiliary driver %q", driverName), err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we barely ever make a new kvm aux driver, may releases can just work with old one,
plus many Embedded users (like cloud code users for example) download minikube aux drivers ones, and some of them need root (for hyperkit) and they Prefer to give this root permission once to that binary and not be forced to get a new aux version on every single minikube update.
it is too much to exit when we can run. warning is fine. no need to disrupt ppl when not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kvm driver is built from minikube source. The docker-machine-driver-kvm2 is just a small wrapper around minikube code. It does not make sense to run an old driver from previous release with new minikube since we don't know if it will work or not. Hoping that an old driver will work is a not a valid release engineering strategy.
On system with a package manager (dnf, apt) this driver should be installed by the package manager in the standard location in the same way minikube is installed. This will ensure that we always run with the right driver tested by the CI. There is no need to install a driver dynamically when software is managed by a package manager.
When minikube is installed by downloading the minikube executable, installing the driver on the first use is a nice feature, but we want to make sure the driver is the driver released with minikube.
If the driver version command does not produce the expected output, something is very wrong and it is better to fail early and loudly, helping the user to fix the issue with minimal debugging.
If you have an old driver and minkube fail to download the driver, the user can download it manually in the same way they downloaded minikube itself. If we think that downloading the driver is not reliable enough, we can provide a tarball with minikube and the driver to avoid the unreliable automatic install. If we think the automatic install is reliable enough, there is no reason to continue if minikube cannot install the right driver.
Hyperkit is depcated and should be removed in minikube 1.38 (#21601) so we can ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could consider converting the kvm
driver from an external driver into an internal driver, then when hyperkit
is gone there wouldn't be any more "aux" drivers and you wouldn't need the extra build... The virsh
command is already required by the driver, and most communication with libvirt is done through XML anyway...
That is, replace the libvirt-go
calls with the matching exec.Command - like with all other minikube drivers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using virsh instead of the libvirt api is not great since it is not designed for machines. But given the complexity and trouble caused by auxiliary drivers it sounds like a good plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kvm2 driver with docker runtime
Times for minikube start: 45.8s 41.1s 45.0s 46.1s 42.6s Times for minikube ingress: 16.3s 15.8s 16.3s 15.8s 15.8s docker driver with docker runtime
Times for minikube start: 25.9s 22.2s 21.9s 20.7s 22.6s Times for minikube ingress: 13.6s 13.6s 10.6s 13.6s 10.6s docker driver with containerd runtime
Times for minikube start: 20.1s 20.7s 22.7s 19.7s 20.3s Times for minikube ingress: 23.1s 39.1s 23.1s 23.1s 24.1s |
Here are the number of top 10 failed tests in each environments with lowest flake rate.
Besides the following environments also have failed tests:
To see the flake rates of all tests by environment, click here. |
/cc @prezha |
I think this is the wrong direction - fixing the aux driver is hard, and we don't have a reason to use aux driver for libvirt. I'll work instead on making the #21618. |
Replaced by #21625 |
This change makes driver version checking more strict. Previously we failed only if the driver was not found or running "driver version" failed. For all other errors we logged a warning and use the currently installed driver. With this change all the errors are fatal and you must have a valid driver to use the kvm or hyperkit driver.
Checking the driver version is more correct and strict. We separate driver version stdout and stderr, parse the driver output using yaml parser, and have more detailed logging and error messages.
The tests for extracting driver version from "driver version" output were replaced with test running driver version command and parsing the version.
This change makes it easy to validate the driver commit hash for addressing #21582.
Based on #21597 for testing to avoid the vfkit test failures.