-
Notifications
You must be signed in to change notification settings - Fork 10
[LTS 9.2] CVE-2025-38084, CVE-2025-38085, CVE-2024-57883 #731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: ciqlts9_2
Are you sure you want to change the base?
Conversation
314f5ab to
cda475b
Compare
jira VULN-71577 cve-pre CVE-2025-38084 commit-author James Houghton <[email protected]> commit b30c14c upstream-diff Stable 5.15 backport bd9a23a was used for the actual (clean) cherry-pick PMD sharing can only be done in PUD_SIZE-aligned pieces of VMAs; however, it is possible that HugeTLB VMAs are split without unsharing the PMDs first. Without this fix, it is possible to hit the uffd-wp-related WARN_ON_ONCE in hugetlb_change_protection [1]. The key there is that hugetlb_unshare_all_pmds will not attempt to unshare PMDs in non-PUD_SIZE-aligned sections of the VMA. It might seem ideal to unshare in hugetlb_vm_op_open, but we need to unshare in both the new and old VMAs, so unsharing in hugetlb_vm_op_split seems natural. [1]: https://lore.kernel.org/linux-mm/CADrL8HVeOkj0QH5VZZbRzybNE8CG-tEGFshnA+bG9nMgcWtBSg@mail.gmail.com/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 6dfeaff ("hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wp") Signed-off-by: James Houghton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Acked-by: Peter Xu <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Muchun Song <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit b30c14c) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-46929 cve CVE-2024-57883 commit-author Liu Shixin <[email protected]> commit 59d9094 upstream-diff Stable 5.15 backport 8410996eb6fea116fe1483ed977aacf580eee7b4 was used for the actual (clean) cherry-pick. Additionally the `atomic_t pt_share_count' field in `include/linux/mm_types.h' was wrapped in RH_KABI_BROKEN_INSERT macro to avoid kABI checker complains. It's justified, because the inserted field (it's included, as CONFIG_ARCH_WANT_HUGE_PMD_SHARE gets enabled for at least `kernel-x86_64-rhel.config') is placed within a union which already contained a field of the same type `atomic_t pt_frag_refcount', so the size of it cannot change. The folio refcount may be increased unexpectly through try_get_folio() by caller such as split_huge_pages. In huge_pmd_unshare(), we use refcount to check whether a pmd page table is shared. The check is incorrect if the refcount is increased by the above caller, and this can cause the page table leaked: BUG: Bad page state in process sh pfn:109324 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x66 pfn:0x109324 flags: 0x17ffff800000000(node=0|zone=2|lastcpupid=0xfffff) page_type: f2(table) raw: 017ffff800000000 0000000000000000 0000000000000000 0000000000000000 raw: 0000000000000066 0000000000000000 00000000f2000000 0000000000000000 page dumped because: nonzero mapcount ... CPU: 31 UID: 0 PID: 7515 Comm: sh Kdump: loaded Tainted: G B 6.13.0-rc2master+ ctrliq#7 Tainted: [B]=BAD_PAGE Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: show_stack+0x20/0x38 (C) dump_stack_lvl+0x80/0xf8 dump_stack+0x18/0x28 bad_page+0x8c/0x130 free_page_is_bad_report+0xa4/0xb0 free_unref_page+0x3cc/0x620 __folio_put+0xf4/0x158 split_huge_pages_all+0x1e0/0x3e8 split_huge_pages_write+0x25c/0x2d8 full_proxy_write+0x64/0xd8 vfs_write+0xcc/0x280 ksys_write+0x70/0x110 __arm64_sys_write+0x24/0x38 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x34/0x128 el0t_64_sync_handler+0xc8/0xd0 el0t_64_sync+0x190/0x198 The issue may be triggered by damon, offline_page, page_idle, etc, which will increase the refcount of page table. 1. The page table itself will be discarded after reporting the "nonzero mapcount". 2. The HugeTLB page mapped by the page table miss freeing since we treat the page table as shared and a shared page table will not be unmapped. Fix it by introducing independent PMD page table shared count. As described by comment, pt_index/pt_mm/pt_frag_refcount are used for s390 gmap, x86 pgds and powerpc, pt_share_count is used for x86/arm64/riscv pmds, so we can reuse the field as pt_share_count. Link: https://lkml.kernel.org/r/[email protected] Fixes: 39dde65 ("[PATCH] shared page table for hugetlb page") Signed-off-by: Liu Shixin <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Ken Chen <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nanyong Sun <[email protected]> Cc: Jane Chu <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 59d9094) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-71577 cve CVE-2025-38084 commit-author Jann Horn <[email protected]> commit 081056d upstream-diff Stable 5.15 backport 366298f2b04d2bf1f2f2b7078405bdf9df9bd5d0 was used for the actual (clean) cherry-pick Currently, __split_vma() triggers hugetlb page table unsharing through vm_ops->may_split(). This happens before the VMA lock and rmap locks are taken - which is too early, it allows racing VMA-locked page faults in our process and racing rmap walks from other processes to cause page tables to be shared again before we actually perform the split. Fix it by explicitly calling into the hugetlb unshare logic from __split_vma() in the same place where THP splitting also happens. At that point, both the VMA and the rmap(s) are write-locked. An annoying detail is that we can now call into the helper hugetlb_unshare_pmds() from two different locking contexts: 1. from hugetlb_split(), holding: - mmap lock (exclusively) - VMA lock - file rmap lock (exclusively) 2. hugetlb_unshare_all_pmds(), which I think is designed to be able to call us with only the mmap lock held (in shared mode), but currently only runs while holding mmap lock (exclusively) and VMA lock Backporting note: This commit fixes a racy protection that was introduced in commit b30c14c ("hugetlb: unshare some PMDs when splitting VMAs"); that commit claimed to fix an issue introduced in 5.13, but it should actually also go all the way back. [[email protected]: v2] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 39dde65 ("[PATCH] shared page table for hugetlb page") Signed-off-by: Jann Horn <[email protected]> Cc: Liam Howlett <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: <[email protected]> [b30c14c: hugetlb: unshare some PMDs when splitting VMAs] Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 081056d) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-71586 cve CVE-2025-38085 commit-author Jann Horn <[email protected]> commit 1013af4 upstream-diff Stable 5.15 backport a3d864c901a300c295692d129159fc3001a56185 was used for the actual cherry-pick. Additionally the 2ba99c5 minus changes in `mm/khugepaged.c' was included to expose the `tlb_remove_table_sync_one' function. huge_pmd_unshare() drops a reference on a page table that may have previously been shared across processes, potentially turning it into a normal page table used in another process in which unrelated VMAs can afterwards be installed. If this happens in the middle of a concurrent gup_fast(), gup_fast() could end up walking the page tables of another process. While I don't see any way in which that immediately leads to kernel memory corruption, it is really weird and unexpected. Fix it with an explicit broadcast IPI through tlb_remove_table_sync_one(), just like we do in khugepaged when removing page tables for a THP collapse. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 39dde65 ("[PATCH] shared page table for hugetlb page") Signed-off-by: Jann Horn <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Muchun Song <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 1013af4) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-46929 cve-bf CVE-2024-57883 commit-author Miaohe Lin <[email protected]> commit 3aa4ed8 upstream-diff Accounted for e95a985 not being backported to ciqlts9_2 - dropped the unnecessary braces in a one-statement `if' conditional. If the pagetables are shared, we shouldn't copy or take references. Since src could have unshared and dst shares with another vma, huge_pte_none() is thus used to determine whether dst_pte is shared. But this check isn't reliable. A shared pte could have pte none in pagetable in fact. The page count of ptep page should be checked here in order to reliably determine whether pte is shared. [[email protected]: remove unused local variable dst_entry in copy_hugetlb_page_range()] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Lukas Bulwahn <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 3aa4ed8) Signed-off-by: Marcin Wcisło <[email protected]>
jira VULN-46929 cve-bf CVE-2024-57883 commit-author Jane Chu <[email protected]> commit 14967a9 upstream-diff | include/linux/mm_types.h Removed the definition of `ptdesc_pmd_is_shared()' function in alignment with stable-5.15 backport 8410996eb6fea116fe1483ed977aacf580eee7b4 (it omits the definition of `ptdesc_pmd_pts_*()' functions family, to which `ptdesc_pmd_is_shared()' belongs). mm/hugetlb.c copy_hugetlb_page_range() 1. Used CONFIG_ARCH_WANT_HUGE_PMD_SHARE instead of CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING, because the latter was introduced only in the non-backported commit 188cac5. 2. Since `ptdesc_pmd_is_shared()' was not defined, read the `pt_share_count' field directly, as is don in the stable-5.15 backport 8410996eb6fea116fe1483ed977aacf580eee7b4. (Compare changes to `huge_pmd_unshare()' in `mm/hugetlb.c' between upstream 59d9094 and stable-5.15 8410996eb6fea116fe1483ed977aacf580eee7b4.) huge_pmd_unshare() No change to the conditional. It was arguably not needed in the upstream as well, probably introduced only for the sake of clarity in the presence of `ptdesc_pmd_is_shared()' function, which is missing here. commit 59d9094 ("mm: hugetlb: independent PMD page table shared count") introduced ->pt_share_count dedicated to hugetlb PMD share count tracking, but omitted fixing copy_hugetlb_page_range(), leaving the function relying on page_count() for tracking that no longer works. When lazy page table copy for hugetlb is disabled, that is, revert commit bcd51a3 ("hugetlb: lazy page table copies in fork()") fork()'ing with hugetlb PMD sharing quickly lockup - [ 239.446559] watchdog: BUG: soft lockup - CPU#75 stuck for 27s! [ 239.446611] RIP: 0010:native_queued_spin_lock_slowpath+0x7e/0x2e0 [ 239.446631] Call Trace: [ 239.446633] <TASK> [ 239.446636] _raw_spin_lock+0x3f/0x60 [ 239.446639] copy_hugetlb_page_range+0x258/0xb50 [ 239.446645] copy_page_range+0x22b/0x2c0 [ 239.446651] dup_mmap+0x3e2/0x770 [ 239.446654] dup_mm.constprop.0+0x5e/0x230 [ 239.446657] copy_process+0xd17/0x1760 [ 239.446660] kernel_clone+0xc0/0x3e0 [ 239.446661] __do_sys_clone+0x65/0xa0 [ 239.446664] do_syscall_64+0x82/0x930 [ 239.446668] ? count_memcg_events+0xd2/0x190 [ 239.446671] ? syscall_trace_enter+0x14e/0x1f0 [ 239.446676] ? syscall_exit_work+0x118/0x150 [ 239.446677] ? arch_exit_to_user_mode_prepare.constprop.0+0x9/0xb0 [ 239.446681] ? clear_bhb_loop+0x30/0x80 [ 239.446684] ? clear_bhb_loop+0x30/0x80 [ 239.446686] entry_SYSCALL_64_after_hwframe+0x76/0x7e There are two options to resolve the potential latent issue: 1. warn against PMD sharing in copy_hugetlb_page_range(), 2. fix it. This patch opts for the second option. While at it, simplify the comment, the details are not actually relevant anymore. Link: https://lkml.kernel.org/r/[email protected] Fixes: 59d9094 ("mm: hugetlb: independent PMD page table shared count") Signed-off-by: Jane Chu <[email protected]> Reviewed-by: Harry Yoo <[email protected]> Acked-by: Oscar Salvador <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Jann Horn <[email protected]> Cc: Liu Shixin <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 14967a9) Signed-off-by: Marcin Wcisło <[email protected]>
cda475b to
111f19f
Compare
Needs removedWe cannot backport this one despite being a CVE The In addition red hat has yet to fix this CVE, I SUSPECT for the exact same reason and this breaks a core structure and they're doing everything they can to not address it its seems we should just need to insert the new code to but its not clean. We can also drop this BFs as well:
Needs minor rework
Could yuou elaborate on that please? |
Not quite. The trick with the unions in
|
It has relatively low CVSS score (5.5) and also severity level "LOW". I often see such CVEs neglected by RH, this could be one of them. I wasn't 100% sure about this kABI issue admittedly, so I'm glad we're having this discussion. |
The resolution for the kABI failure is correct. There isn't a kABI break here, so it is correct to use |
The only functional change in commit 2ba99c5 is that made to the
I opted for (3).
You mean to do (3) as a separate commit? Or to do (2)? |
To be clear @kerneltoast corrected me on some To clarify there is So i'll take back my
2 ... the exposure of the unless this exposes something really gross, my quick glance is it looks like it should slot right in. But its also fixing stuff so that is also good. |
Now that I tried to plug it in quickly I recollect why I avoided it - there are conflicts, and they don't seem trivial. Would have to take a closer look at the history of changes, especially given that the 5.15 version is different from upstream in that |
Yes, I get the levels of thought here. I was wondering though, if some union was "public", in a sense that it could be used not by its owner (which allocated and initialized it), then there would have to be some switch somewhere (a variable) encoding how it should be interpreted? It would therefore be reasonable to assume that a driver, or any other user, checks the value of this switch before using the union. Introducing a new field to it would would imply using a new value for the switch, which the driver should be able to recognize that it doesn't recognize, and reject the unknown data structure gracefully. At least that was my thinking which I saved from Is it too much to expect from drivers? Or is my asssumption wrong and can unions be used without such switch somehow? |
What is the conflict, maybe I missed something when i glanced at the insertion points. Can you show the conflicts in a diff in a comment?
I think its hard to know how good the OOT drivers are because we can't always see their code. The big thing to consider is that even MAJOR driver vendors in the kernel src tree get regularly yelled at because of bad code quality and usually those LKML request come from their OOT 'dev' driver. My general assumption is that if it can be used wrong it will be used wrong. |
Which one? Variant from
This says nothing, really. The deeper dive is unavoidable. Given below. First we need to establish whether to use
Cherry picking changes in
Introducing The conflict for This means manual resolution of suggests that Here's how the modified PR would look like: d23c840 Do you want me to poceed with this version? |
|
Let me take a look, it'll probably be a monday thing |
A simpler way of looking at it: Generally, a union can serve one of two purposes: it can either provide a cast to reinterpret your data in a more convenient way (e.g., a union of two u32 integers can be combined together and read as a single u64 without needing to do any bit shifting), OR it can allow you to reuse existing bytes for multiple mutually exclusive purposes. In the latter case, there must be some way of determining which union member is the one that accurately represents the type of data stored within. If you have a union containing a float and an int, how do you know which one to use to interpret the data? If the union is passed around to code that can deal with both cases (float and int), then you'd need to also provide a flag that says "hey this is a float" or "hey this is an int". If the union is reusing bytes for mutually exclusive purposes, then it means you don't need to be able to store both types of data at the same time, but you do need to be able to store both of them. Adding a new member to a union without changing the size of the union isn't ever inherently a kABI break. What is a kABI break is trampling over data that something else stored and expects to retrieve fully intact. Therefore, to verify kABI correctness for new union members, you only need to audit the code accessing the new member to make sure it doesn't modify those bytes while something else is actively using those bytes for a different union member. It is ultimately just a synchronization problem. |
[LTS 9.2]
CVE-2025-38084 VULN-71577
CVE-2025-38085 VULN-71586
CVE-2024-57883 VULN-46929
Summary
The driving CVE was CVE-2025-38085. Fix for CVE-2025-38084 was included because it was closely related (same patch set). Additionally, the fix for CVE-2025-38085 required a prerequisite which had its own CVE-2024-57883.
The changes differ visibly from the upstream. Most of the differences result from using stable 5.15 backports which otherwise applied cleanly to the
ciqlts9_2codebase. The exception is 14967a9 which wasn't backported to 5.15 yet and it was adapted tociqlts9_2from the upstream by hand. The following table summarizes all the commits used and their roleCommits
89824bf:
6b0f840:
d7056d3:
76a736b:
35e8761:
314f5ab:
kABI check: passed
Boot test: passed
boot-test.log
Kselftests: passed relative
Reference
kselftests–ciqlts9_2–run1.log
Patch
kselftests–ciqlts9_2-CVE-batch-12–run1.log
Comparison
The results in reference and patch were the same except for the
net/forwarding:vxlan_asymmetric.shtest which failed in the patched version for some reason.The test was repeated on the patched version.
kselftests–ciqlts9_2-vxlan_asymmetric–run1.log
kselftests–ciqlts9_2-vxlan_asymmetric–run2.log
kselftests–ciqlts9_2-vxlan_asymmetric–run3.log