-
Notifications
You must be signed in to change notification settings - Fork 22
Prep 2.2.0 release #189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prep 2.2.0 release #189
Conversation
Ksplice cold-patches are a pain because they are kernel modules whose build IDs (and debuginfo) are mismatched, but they otherwise look just like the in-tree module. Previously we tried to detect them, and thus avoid attempting to extract or load their debuginfo. However, in practice this doesn't seem feasible. While I've been able to find some signals that a module may be a cold-patch, none have generalized to all architectures and versions. Instead, we need to just handle the effects of this problem. When cold-patches aren't handled, the ol-download and ol-local-rpm finders will repeat attempting to download & extract these debuginfo files, every time they're used. We already have some safeguards to prevent double-execution (download, then re-extract). But we can extend this safeguard to the case where we've previously extracted the RPM. If we already tried the file from the vmlinux_repo, then there's no point in trying to download or extract that module again. Signed-off-by: Stephen Brennan <[email protected]>
With the module API we can report the actual DWARF file that gets loaded. But CTF wasn't explicitly reported. Given that the Oracle plugin now handles CTF loading, we can also save the file that got loaded, so that we can later report it for the CLI or corelens logs. Signed-off-by: Stephen Brennan <[email protected]>
While a recent commit handled the case where we had extracted files from a downloaded RPM, and there was a build ID mismatch, there was still the case where the debuginfo RPM was installed to the system. Since drgn's standard finder loads those files, we would never have the opportunity to populate the "extracted" set for those modules. Thus, when the debuginfo RPM is installed, it would be possible for us to try to download and extract debuginfo in the presence of a build ID mismatch (e.g. ksplice cold-patch). Avoid this and also report a warning. Signed-off-by: Stephen Brennan <[email protected]>
This will support some of our internal customer debugging environments, by allowing us to extract debuginfo in directories relative to the core dumps that we are debugging. Signed-off-by: Stephen Brennan <[email protected]>
Right now drgn, DRGN, & corelens just delay when extracting. We really should print a status message to let users know what is happening. Signed-off-by: Stephen Brennan <[email protected]>
Maintaining the outfile & report parameters is a bit difficult for a few
reasons. First, the "outfile" parameter is a string filename, which
means that whenever an output must be written, the file must be opened.
Second, the "report" parameter is intended to determine the mode (append
vs write), but this becomes less than useful if you need to write
multiple things at a time: when report is False, you'll only get the
last item printed.
The intended use case for these parameters seems to be so that we can
easily provide custom RDS scripts to customers. The idea being that many
outputs would be too large, so we may need to only run certain
functions, and redirect output to several files for ease of access.
To support this, let's create a @redirectable decorator. It will take
any function, and allow it to accept an "outfile" parameter. When
provided, this parameter will redirect the function's output to the
file. An optional :w or :a can be appended to the filename in order to
specify the mode (it is :w by default). All print statements can simply
write to stdout, and it will be redirected appropriately where
necessary. For example, a custom script could now be created easily:
from drgn_tools import rds
rds.rds_conn_info(prog, outfile="conn_info.txt")
rds.rds_sock_info(prog, outfile="other_data.txt:a")
rds.rdma_resource_usage(prog, outfile="other_data.txt:a")
Signed-off-by: Stephen Brennan <[email protected]>
This will soon become moot, as we will likely be adding drgn commands
for corelens, that work on 0.0.33 and later. But for now, it's useful:
>>> cl("dentrycache -l 50000", outfile="foo.txt")
Signed-off-by: Stephen Brennan <[email protected]>
bb11231 to
c4ff3a5
Compare
The functions themselves raise appropriate errors, but we don't want the tests to fail on these vmcores. Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
This ensures we have helpers with the latest fixes for the latest upstream kernels. Signed-off-by: Stephen Brennan <[email protected]>
The drgn timekeeping helpers were introduced in drgn 0.0.32 and can be used to replace our existing tk_core / shadow_timekeeper code. What's more, they are kept up-to-date with the latest kernel changes, so long as a recent enough drgn version is used. Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
There are occasional test failures on live systems where the stack changes during a test. Of course there's no guarantee of stability here, but let's give a grace period to reduce the chances and hopefully avoid the test failure. Signed-off-by: Stephen Brennan <[email protected]>
It has been a long time since the readme got touched, and it's a bit out of date. Update it to focus more heavily on Corelens, give CTF a mention, and link to OL documentation. Also, give a bit of description for how to use the debuginfo plugin. Signed-off-by: Stephen Brennan <[email protected]>
The "kvm" corelens module should not run unless the kvm kernel module is loaded and debuginfo is present. Signed-off-by: Stephen Brennan <[email protected]>
When reading logs it's not always obvious which test run resulted in a failure. Log the full details of the test so that it is easier to detect. Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
biger410
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
|
Thank you! The test failure is only due to the UEK7 debuginfo being missing for the latest release, for some reason. I think it's just a race condition and it will be uploaded soon. In any case, I've done quite a bit of other testing so I'm confident that the tests do pass. |
This is ready for review.