Skip to content

Conversation

@brenns10
Copy link
Member

@brenns10 brenns10 commented Oct 31, 2025

This is ready for review.

Ksplice cold-patches are a pain because they are kernel modules whose
build IDs (and debuginfo) are mismatched, but they otherwise look just
like the in-tree module.

Previously we tried to detect them, and thus avoid attempting to extract
or load their debuginfo. However, in practice this doesn't seem
feasible. While I've been able to find some signals that a module may be
a cold-patch, none have generalized to all architectures and versions.

Instead, we need to just handle the effects of this problem. When
cold-patches aren't handled, the ol-download and ol-local-rpm finders
will repeat attempting to download & extract these debuginfo files,
every time they're used. We already have some safeguards to prevent
double-execution (download, then re-extract). But we can extend this
safeguard to the case where we've previously extracted the RPM. If we
already tried the file from the vmlinux_repo, then there's no point in
trying to download or extract that module again.

Signed-off-by: Stephen Brennan <[email protected]>
With the module API we can report the actual DWARF file that gets
loaded. But CTF wasn't explicitly reported. Given that the Oracle plugin
now handles CTF loading, we can also save the file that got loaded, so
that we can later report it for the CLI or corelens logs.

Signed-off-by: Stephen Brennan <[email protected]>
While a recent commit handled the case where we had extracted files from
a downloaded RPM, and there was a build ID mismatch, there was still the
case where the debuginfo RPM was installed to the system. Since drgn's
standard finder loads those files, we would never have the opportunity
to populate the "extracted" set for those modules. Thus, when the
debuginfo RPM is installed, it would be possible for us to try to
download and extract debuginfo in the presence of a build ID
mismatch (e.g. ksplice cold-patch). Avoid this and also report a warning.

Signed-off-by: Stephen Brennan <[email protected]>
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Oct 31, 2025
This will support some of our internal customer debugging environments,
by allowing us to extract debuginfo in directories relative to the core
dumps that we are debugging.

Signed-off-by: Stephen Brennan <[email protected]>
Right now drgn, DRGN, & corelens just delay when extracting. We really
should print a status message to let users know what is happening.

Signed-off-by: Stephen Brennan <[email protected]>
Maintaining the outfile & report parameters is a bit difficult for a few
reasons. First, the "outfile" parameter is a string filename, which
means that whenever an output must be written, the file must be opened.
Second, the "report" parameter is intended to determine the mode (append
vs write), but this becomes less than useful if you need to write
multiple things at a time: when report is False, you'll only get the
last item printed.

The intended use case for these parameters seems to be so that we can
easily provide custom RDS scripts to customers. The idea being that many
outputs would be too large, so we may need to only run certain
functions, and redirect output to several files for ease of access.

To support this, let's create a @redirectable decorator. It will take
any function, and allow it to accept an "outfile" parameter. When
provided, this parameter will redirect the function's output to the
file. An optional :w or :a can be appended to the filename in order to
specify the mode (it is :w by default). All print statements can simply
write to stdout, and it will be redirected appropriately where
necessary. For example, a custom script could now be created easily:

    from drgn_tools import rds
    rds.rds_conn_info(prog, outfile="conn_info.txt")
    rds.rds_sock_info(prog, outfile="other_data.txt:a")
    rds.rdma_resource_usage(prog, outfile="other_data.txt:a")

Signed-off-by: Stephen Brennan <[email protected]>
This will soon become moot, as we will likely be adding drgn commands
for corelens, that work on 0.0.33 and later. But for now, it's useful:

>>> cl("dentrycache -l 50000", outfile="foo.txt")

Signed-off-by: Stephen Brennan <[email protected]>
@brenns10 brenns10 force-pushed the prep-2.2.x branch 4 times, most recently from bb11231 to c4ff3a5 Compare November 1, 2025 07:16
The functions themselves raise appropriate errors, but we don't want the
tests to fail on these vmcores.

Signed-off-by: Stephen Brennan <[email protected]>
This ensures we have helpers with the latest fixes for the latest
upstream kernels.

Signed-off-by: Stephen Brennan <[email protected]>
The drgn timekeeping helpers were introduced in drgn 0.0.32 and can be
used to replace our existing tk_core / shadow_timekeeper code. What's
more, they are kept up-to-date with the latest kernel changes, so long
as a recent enough drgn version is used.

Signed-off-by: Stephen Brennan <[email protected]>
There are occasional test failures on live systems where the stack
changes during a test. Of course there's no guarantee of stability here,
but let's give a grace period to reduce the chances and hopefully avoid
the test failure.

Signed-off-by: Stephen Brennan <[email protected]>
It has been a long time since the readme got touched, and it's a bit out
of date. Update it to focus more heavily on Corelens, give CTF a
mention, and link to OL documentation. Also, give a bit of description
for how to use the debuginfo plugin.

Signed-off-by: Stephen Brennan <[email protected]>
The "kvm" corelens module should not run unless the kvm kernel module is
loaded and debuginfo is present.

Signed-off-by: Stephen Brennan <[email protected]>
When reading logs it's not always obvious which test run resulted in a
failure. Log the full details of the test so that it is easier to
detect.

Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
Copy link
Member

@biger410 biger410 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@brenns10
Copy link
Member Author

brenns10 commented Nov 4, 2025

Thank you! The test failure is only due to the UEK7 debuginfo being missing for the latest release, for some reason. I think it's just a race condition and it will be uploaded soon. In any case, I've done quite a bit of other testing so I'm confident that the tests do pass.

@brenns10 brenns10 merged commit 13ff4ce into oracle-samples:main Nov 4, 2025
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

allow-missing-latest OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants