Skip to content

Conversation

@bmastbergen
Copy link
Collaborator

This change came as a patch from Google. It is not upstream as there is still ongoing work to fix this issue in the upstream https://lore.kernel.org/linux-mm/[email protected]/. This patch is a stopgap until the upstream solution is ready. I had to make a few tweaks to the supplied patch, which I have mentioned in the upstream-diff section of the commit message.

Original patch:
v1-0001-mm-scrape-LRU-pages-for-offlined-memcgs.patch

jira KERNEL-173
feature Add ability to scrape LRU pages from offlined memcgs
commit-author: Yu Zhao <[email protected]>
commit-source v1-0001-mm-scrape-LRU-pages-for-offlined-memcgs.patch commit-source-path Provided by Google Engineering
upstream-diff A few tweaks to the original patch were necessary:
              * Format changes because Documentation/sysctl/vm.txt has
                been changed to Documentation/admin-guide/sysctl/vm.rst
              * Removed unused nid variable from scrape_offlined_memcgs
              * Switched drop_caches_sysctl_handler to use SYSCTL_EIGHT
                (otherwise 'echo 8 > /proc/sys/vm/drop_caches' would be
                rejected)
              * Renamed nr_pages_to_scrape to offlined_memcg_nr_pages in the
                !CONFIG_MEMCG case to match the CONFIG_MEMCG case
              * Added 'return 0' to scrape_offlined_memcgs in the
                !CONFIG_MEMCG case

For offlined memcgs, kmem (slab) is reparented so that it does not hold refcnts which would in turn prevent those memcgs from being released.

However, reparenting does not apply to LRU pages (pagecache), and therefore they need to be scraped as well for offlined memcgs. "echo 8 > /proc/sys/vm/drop_caches" was introduced for this reason. And unlike "echo 1", it does not have performance impact on online memcgs in terms of zapping pagecache.

Build Log

  CLEAN   scripts/mod
  CLEAN   scripts/selinux/genheaders
  CLEAN   scripts/selinux/mdp
  CLEAN   scripts
  CLEAN   include/config include/generated arch/x86/include/generated .config .config.old certs/signing_key.pem certs/signing_key.x509 certs/x509.genkey
[TIMER]{MRPROPER}: 16s
x86_64 architecture detected, copying config
'configs/kernel-x86_64-rhel.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-bmastbergen_fips-9-compliant_5.14.0-284.30.1_KERNEL-1"
Making olddefconfig
--
  HOSTCC  scripts/kconfig/util.o
  HOSTLD  scripts/kconfig/conf
#
# configuration written to .config
#
Starting Build
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_64.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_32.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_x32.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
--
  BTF [M] sound/usb/usx2y/snd-usb-usx2y.ko
  LD [M]  virt/lib/irqbypass.ko
  BTF [M] virt/lib/irqbypass.ko
  BTF [M] sound/xen/snd_xen_front.ko
  BTF [M] sound/virtio/virtio_snd.ko
[TIMER]{BUILD}: 939s
Making Modules
  INSTALL /lib/modules/5.14.0-bmastbergen_fips-9-compliant_5.14.0-284.30.1_KERNEL-1+/kernel/arch/x86/crypto/blake2s-x86_64.ko
  INSTALL /lib/modules/5.14.0-bmastbergen_fips-9-compliant_5.14.0-284.30.1_KERNEL-1+/kernel/arch/x86/crypto/camellia-aesni-avx-x86_64.ko
  INSTALL /lib/modules/5.14.0-bmastbergen_fips-9-compliant_5.14.0-284.30.1_KERNEL-1+/kernel/arch/x86/crypto/blowfish-x86_64.ko
  INSTALL /lib/modules/5.14.0-bmastbergen_fips-9-compliant_5.14.0-284.30.1_KERNEL-1+/kernel/arch/x86/crypto/camellia-aesni-avx2.ko
--
  SIGN    /lib/modules/5.14.0-bmastbergen_fips-9-compliant_5.14.0-284.30.1_KERNEL-1+/kernel/virt/lib/irqbypass.ko
  SIGN    /lib/modules/5.14.0-bmastbergen_fips-9-compliant_5.14.0-284.30.1_KERNEL-1+/kernel/sound/x86/snd-hdmi-lpe-audio.ko
  SIGN    /lib/modules/5.14.0-bmastbergen_fips-9-compliant_5.14.0-284.30.1_KERNEL-1+/kernel/sound/xen/snd_xen_front.ko
  SIGN    /lib/modules/5.14.0-bmastbergen_fips-9-compliant_5.14.0-284.30.1_KERNEL-1+/kernel/sound/virtio/virtio_snd.ko
  DEPMOD  /lib/modules/5.14.0-bmastbergen_fips-9-compliant_5.14.0-284.30.1_KERNEL-1+
[TIMER]{MODULES}: 10s
Making Install
sh ./arch/x86/boot/install.sh \
	5.14.0-bmastbergen_fips-9-compliant_5.14.0-284.30.1_KERNEL-1+ arch/x86/boot/bzImage \
	System.map "/boot"
[TIMER]{INSTALL}: 61s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-5.14.0-bmastbergen_fips-9-compliant_5.14.0-284.30.1_KERNEL-1+ and Index to 1
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 16s
[TIMER]{BUILD}: 939s
[TIMER]{MODULES}: 10s
[TIMER]{INSTALL}: 61s
[TIMER]{TOTAL} 1045s
Rebooting in 10 seconds

Testing

Verified that 8 is now an accepted value to /proc/sys/vm/drop_caches. Also did a build with some debug to ensure scrape_offlined_memcgs was actually being called.

brett@lycia ~/ciq/kernel-173 % cat drop_caches-5.14.0-284.30.1.el9_2.92ciq_lts.14.2.x86_64.log
[brett@kernel-173 ~]$ uname -a
Linux kernel-173 5.14.0-284.30.1.el9_2.92ciq_lts.14.2.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Nov 24 18:30:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
[brett@kernel-173 ~]$ sudo sh -c 'echo 1 > /proc/sys/vm/drop_caches'
[brett@kernel-173 ~]$ sudo sh -c 'echo 2 > /proc/sys/vm/drop_caches'
[brett@kernel-173 ~]$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
[brett@kernel-173 ~]$ sudo sh -c 'echo 8 > /proc/sys/vm/drop_caches'
sh: line 1: echo: write error: Invalid argument
[brett@kernel-173 ~]$

brett@lycia ~/ciq/kernel-173 % cat drop_caches-5.14.0-bmastbergen_fips-9-compliant_5.14.0-284.30.1_KERNEL-1+.log
[brett@kernel-173 ~]$ uname -a
Linux kernel-173 5.14.0-bmastbergen_fips-9-compliant_5.14.0-284.30.1_KERNEL-1+ #1 SMP PREEMPT_DYNAMIC Mon Dec 1 21:19:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
[brett@kernel-173 ~]$ sudo sh -c 'echo 1 > /proc/sys/vm/drop_caches'
[brett@kernel-173 ~]$ sudo sh -c 'echo 2 > /proc/sys/vm/drop_caches'
[brett@kernel-173 ~]$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
[brett@kernel-173 ~]$ sudo sh -c 'echo 8 > /proc/sys/vm/drop_caches'
[brett@kernel-173 ~]$

brett@lycia ~/ciq/kernel-173 %

Also did normal smoke testing

selftest-5.14.0-284.30.1.el9_2.92ciq_lts.14.2.x86_64-1.log

drop_caches-5.14.0-bmastbergen_fips-9-compliant_5.14.0-284.30.1_KERNEL-1+.log

brett@lycia ~/ciq/kernel-173/kselftest-logs
 % grep ^ok selftest-5.14.0-284.30.1.el9_2.92ciq_lts.14.2.x86_64-1.log | wc -l
332
brett@lycia ~/ciq/kernel-173/kselftest-logs
 % grep ^ok selftest-5.14.0-bmastbergen_fips-9-compliant_5.14.0-284.30.1_KERNEL-1+-1.log | wc -l
332
brett@lycia ~/ciq/kernel-173/kselftest-logs
 % grep ok <(diff -adU0 <(grep ^ok selftest-5.14.0-284.30.1.el9_2.92ciq_lts.14.2.x86_64-1.log | sort -h) <(grep ^ok selftest-5.14.0-bmastbergen_fips-9-compliant_5.14.0-284.30.1_KERNEL-1+-1.log | sort -h))

-ok 1 selftests: livepatch: test-livepatch.sh # SKIP
+ok 1 selftests: livepatch: test-livepatch.sh
-ok 1 selftests: zram: zram.sh # SKIP
+ok 1 selftests: zram: zram.sh
-ok 2 selftests: livepatch: test-callbacks.sh # SKIP
+ok 2 selftests: livepatch: test-callbacks.sh
+ok 32 selftests: net: l2tp.sh
-ok 3 selftests: livepatch: test-shadow-vars.sh # SKIP
+ok 3 selftests: livepatch: test-shadow-vars.sh
-ok 4 selftests: livepatch: test-state.sh # SKIP
+ok 4 selftests: livepatch: test-state.sh
-ok 58 selftests: kvm: max_guest_memory_test
-ok 5 selftests: livepatch: test-ftrace.sh # SKIP
+ok 5 selftests: livepatch: test-ftrace.sh
-ok 6 selftests: net: tls
+ok 9 selftests: net: test_bpf.sh
brett@lycia ~/ciq/kernel-173/kselftest-logs
 %

jira KERNEL-173
feature Add ability to scrape LRU pages from offlined memcgs
commit-author: Yu Zhao <[email protected]>
commit-source v1-0001-mm-scrape-LRU-pages-for-offlined-memcgs.patch
commit-source-path Provided by Google Engineering
upstream-diff A few tweaks to the original patch were necessary:
              * Format changes because Documentation/sysctl/vm.txt has
                been changed to Documentation/admin-guide/sysctl/vm.rst
              * Removed unused nid variable from scrape_offlined_memcgs
              * Switched drop_caches_sysctl_handler to use SYSCTL_EIGHT
                (otherwise 'echo 8 > /proc/sys/vm/drop_caches' would be
                rejected)
              * Renamed nr_pages_to_scrape to offlined_memcg_nr_pages in the
                !CONFIG_MEMCG case to match the CONFIG_MEMCG case
              * Added 'return 0' to scrape_offlined_memcgs in the
                !CONFIG_MEMCG case

For offlined memcgs, kmem (slab) is reparented so that it does not hold
refcnts which would in turn prevent those memcgs from being released.

However, reparenting does not apply to LRU pages (pagecache), and
therefore they need to be scraped as well for offlined memcgs.
"echo 8 > /proc/sys/vm/drop_caches" was introduced for this reason. And
unlike "echo 1", it does not have performance impact on online memcgs in
terms of zapping pagecache.

Signed-off-by: Yu Zhao <[email protected]>
Signed-off-by: Brett Mastbergen <[email protected]>
@bmastbergen bmastbergen requested a review from a team December 2, 2025 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants