Skip to content

Conversation

@bmastbergen
Copy link
Collaborator

@bmastbergen bmastbergen commented Nov 26, 2025

This change came as a patch from Google. It is not upstream as there is still ongoing work to fix this issue in the upstream https://lore.kernel.org/linux-mm/[email protected]/. This patch is a stopgap until the upstream solution is ready. I had to make a few tweaks to the supplied patch, which I have mentioned in the upstream-diff section of the commit message.

Original patch:
v1-0001-mm-scrape-LRU-pages-for-offlined-memcgs.patch

mm: scrape LRU pages for offlined memcgs

jira KERNEL-172
feature Add ability to scrape LRU pages from offlined memcgs commit-author Yu Zhao <[email protected]>
commit-source v1-0001-mm-scrape-LRU-pages-for-offlined-memcgs.patch commit-source-path Provided by Google Engineering
upstream-diff A few tweaks to the original patch were necessary:
              * Removed unused nid variable from scrape_offlined_memcgs
              * Switched extra2 to 8 (otherwise 'echo 8 > /proc/sys/vm/drop_caches'
                would be rejected)
              * Renamed nr_pages_to_scrape to offlined_memcg_nr_pages in the
                !CONFIG_MEMCG case to match the CONFIG_MEMCG case
              * Added 'return 0' to scrape_offlined_memcgs in the
                !CONFIG_MEMCG case

For offlined memcgs, kmem (slab) is reparented so that it does not hold refcnts which would in turn prevent those memcgs from being released.

However, reparenting does not apply to LRU pages (pagecache), and therefore they need to be scraped as well for offlined memcgs. "echo 8 > /proc/sys/vm/drop_caches" was introduced for this reason. And unlike "echo 1", it does not have performance impact on online memcgs in terms of zapping pagecache.

Build Log

/home/brett/kernel-src-tree
Running make mrproper...
[TIMER]{MRPROPER}: 11s
x86_64 architecture detected, copying config
'configs/kernel-x86_64.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-bmastbergen_rlc-8_4.18.0-553.83.1.el8_10_KERNEL-172-9"
Making olddefconfig
--
  HOSTLD  scripts/kconfig/conf
scripts/kconfig/conf  --olddefconfig Kconfig
#
# configuration written to .config
#
Starting Build
scripts/kconfig/conf  --syncconfig Kconfig
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_64_x32.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_64.h
--
  LD [M]  sound/usb/usx2y/snd-usb-us122l.ko
  LD [M]  sound/x86/snd-hdmi-lpe-audio.ko
  LD [M]  sound/virtio/virtio_snd.ko
  LD [M]  sound/xen/snd_xen_front.ko
  LD [M]  virt/lib/irqbypass.ko
[TIMER]{BUILD}: 1024s
Making Modules
  INSTALL arch/x86/crypto/blowfish-x86_64.ko
  INSTALL arch/x86/crypto/camellia-aesni-avx-x86_64.ko
  INSTALL arch/x86/crypto/camellia-aesni-avx2.ko
  INSTALL arch/x86/crypto/cast5-avx-x86_64.ko
--
  INSTALL sound/virtio/virtio_snd.ko
  INSTALL sound/x86/snd-hdmi-lpe-audio.ko
  INSTALL sound/xen/snd_xen_front.ko
  INSTALL virt/lib/irqbypass.ko
  DEPMOD  4.18.0-bmastbergen_rlc-8_4.18.0-553.83.1.el8_10_KERNEL-172-9+
[TIMER]{MODULES}: 11s
Making Install
sh ./arch/x86/boot/install.sh 4.18.0-bmastbergen_rlc-8_4.18.0-553.83.1.el8_10_KERNEL-172-9+ arch/x86/boot/bzImage \
	System.map "/boot"
[TIMER]{INSTALL}: 55s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-4.18.0-bmastbergen_rlc-8_4.18.0-553.83.1.el8_10_KERNEL-172-9+ and Index to 2
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 11s
[TIMER]{BUILD}: 1024s
[TIMER]{MODULES}: 11s
[TIMER]{INSTALL}: 55s
[TIMER]{TOTAL} 1117s
Rebooting in 10 seconds

Testing

Verified that 8 is now an accepted value to /proc/sys/vm/drop_caches. Also did a build with some debug to ensure scrape_offlined_memcgs was actually being called.

brett@lycia ~/ciq/kernel-172 % cat drop_caches-4.18.0-553.16.1.el8_10.ciqfips.0.14.1.x86_64.log
[brett@kernel-172 ~]$ uname -a
Linux kernel-172 4.18.0-553.16.1.el8_10.ciqfips.0.14.1.x86_64 #1 SMP Thu Oct 2 20:02:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
[brett@kernel-172 ~]$ sudo sh -c 'echo 1 > /proc/sys/vm/drop_caches'
[brett@kernel-172 ~]$ sudo sh -c 'echo 2 > /proc/sys/vm/drop_caches'
[brett@kernel-172 ~]$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
[brett@kernel-172 ~]$ sudo sh -c 'echo 8 > /proc/sys/vm/drop_caches'
sh: line 0: echo: write error: Invalid argument
[brett@kernel-172 ~]$

brett@lycia ~/ciq/kernel-172 % cat drop_caches-4.18.0-bmastbergen_rlc-8_4.18.0-553.83.1.el8_10_KERNEL-172-9+.log
[brett@kernel-172 ~]$ uname -a
Linux kernel-172 4.18.0-bmastbergen_rlc-8_4.18.0-553.83.1.el8_10_KERNEL-172-9+ #1 SMP Tue Nov 25 20:23:46 UTC 2025 x86_64 x86_64 x86_64
GNU/Linux
[brett@kernel-172 ~]$ sudo sh -c 'echo 1 > /proc/sys/vm/drop_caches'
[brett@kernel-172 ~]$ sudo sh -c 'echo 2 > /proc/sys/vm/drop_caches'
[brett@kernel-172 ~]$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
[brett@kernel-172 ~]$ sudo sh -c 'echo 8 > /proc/sys/vm/drop_caches'
[brett@kernel-172 ~]$
brett@lycia ~/ciq/kernel-172 %

Also did normal smoke testing

selftest-4.18.0-553.16.1.el8_10.ciqfips.0.14.1.x86_64-1.log

selftest-4.18.0-bmastbergen_rlc-8_4.18.0-553.83.1.el8_10_KERNEL-172-9+-1.log

brett@lycia ~/ciq/kernel-172/kselftest-logs
 % grep ^ok selftest-4.18.0-553.16.1.el8_10.ciqfips.0.14.1.x86_64-1.log | wc -l
269
brett@lycia ~/ciq/kernel-172/kselftest-logs
 % grep ^ok selftest-4.18.0-bmastbergen_rlc-8_4.18.0-553.83.1.el8_10_KERNEL-172-9+-1.log | wc -l
271
brett@lycia ~/ciq/kernel-172/kselftest-logs
 % grep ok <(diff -adU0 <(grep ^ok selftest-4.18.0-553.16.1.el8_10.ciqfips.0.14.1.x86_64-1.log | sort -h) <(grep ^ok selftest-4.18.0-bmastbergen_rlc-8_4.18.0-553.83.1.el8_10_KERNEL-172-9+-1.log | sort -h))

-ok 1 selftests: livepatch: test-livepatch.sh # SKIP
+ok 1 selftests: livepatch: test-livepatch.sh
-ok 1 selftests: zram: zram.sh # SKIP
+ok 1 selftests: zram: zram.sh
+ok 29 selftests: net: l2tp.sh
-ok 2 selftests: livepatch: test-callbacks.sh # SKIP
+ok 2 selftests: livepatch: test-callbacks.sh
-ok 38 selftests: net: bareudp.sh # SKIP
+ok 38 selftests: net: bareudp.sh
-ok 3 selftests: livepatch: test-shadow-vars.sh # SKIP
+ok 3 selftests: livepatch: test-shadow-vars.sh
-ok 4 selftests: livepatch: test-state.sh # SKIP
+ok 4 selftests: livepatch: test-state.sh
-ok 5 selftests: livepatch: test-ftrace.sh # SKIP
+ok 5 selftests: livepatch: test-ftrace.sh
+ok 9 selftests: net: test_bpf.sh
brett@lycia ~/ciq/kernel-172/kselftest-logs
 %

jira KERNEL-172
feature Add ability to scrape LRU pages from offlined memcgs
commit-author Yu Zhao <[email protected]>
commit-source v1-0001-mm-scrape-LRU-pages-for-offlined-memcgs.patch
commit-source-path Provided by Google Engineering
upstream-diff A few tweaks to the original patch were necessary:
              * Removed unused nid variable from scrape_offlined_memcgs
              * Switched extra2 to 8 (otherwise 'echo 8 > /proc/sys/vm/drop_caches'
                would be rejected)
              * Renamed nr_pages_to_scrape to offlined_memcg_nr_pages in the
                !CONFIG_MEMCG case to match the CONFIG_MEMCG case
              * Added 'return 0' to scrape_offlined_memcgs in the
                !CONFIG_MEMCG case

For offlined memcgs, kmem (slab) is reparented so that it does not hold
refcnts which would in turn prevent those memcgs from being released.

However, reparenting does not apply to LRU pages (pagecache), and
therefore they need to be scraped as well for offlined memcgs.
"echo 8 > /proc/sys/vm/drop_caches" was introduced for this reason. And
unlike "echo 1", it does not have performance impact on online memcgs in
terms of zapping pagecache.

Signed-off-by: Yu Zhao <[email protected]>
Signed-off-by: Brett Mastbergen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants