Skip to content

wpcomsh: bail to core's default fatal screen on memory-exhaustion fatals#50086

Draft
arthur791004 wants to merge 1 commit into
trunkfrom
fix/fatal-error-screen-oom-cascade
Draft

wpcomsh: bail to core's default fatal screen on memory-exhaustion fatals#50086
arthur791004 wants to merge 1 commit into
trunkfrom
fix/fatal-error-screen-oom-cascade

Conversation

@arthur791004

@arthur791004 arthur791004 commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Fixes #

Proposed changes

When the underlying fatal is a PHP memory-exhaustion error (OOM), the request is already at its memory ceiling. The wpcomsh branded fatal-error screen hooks wp_php_error_message and does memory-heavy work (textdomain load, plugin-header reads via glob/get_plugin_data, user bootstrap, ob_start), so on an OOM it re-fatals mid-render, which:

  • gives visitors a blank white page instead of even core's default critical-error screen,
  • logs the re-fatal against fatal-error-screen.php, masking the real culprit in logstash (this is what produced the fatal noise in the escalation).

Key constraint: a PHP OOM is not a catchable Throwable, so the render can't be wrapped in try/catch and recovered — the only way to let the screen render at the memory ceiling is to create headroom first.

This PR adds a single guard at the top of wpcomsh_customize_fatal_error_message():

if ( wpcomsh_fatal_is_oom( $error ) && ! wpcomsh_fatal_raise_memory_limit() ) {
    return (string) $message;
}
  • wpcomsh_fatal_is_oom() — detects memory-exhaustion fatals (both Allowed memory size of N bytes exhausted and Out of memory).
  • wpcomsh_fatal_raise_memory_limit() — bumps memory_limit by a bounded +32 MB via ini_set (which takes effect inside the fatal handler) and returns whether headroom is available.

Behavior by case:

Case Result
Non-OOM fatal Unchanged — full branded screen, error details, logstash event.
OOM, raise succeeds Headroom created; the normal path runs unchanged — admin sees the error and the wpcomsh_fatal_signature logstash event still fires.
OOM, raise refused (ini_set capped/disabled) Return core's default screen (no blank page); PHP still logs the actual fatal to logstash.

Notes:

  • $message is only core's generic wrapper HTML ("There has been a critical error on this website.") — no trace/error info — so returning it in the fallback loses nothing; the real error lives in $error (shown to admins) and PHP's fatal log.
  • A direct ini_set is used rather than wp_raise_memory_limit(), which runs apply_filters in the fatal path and no-ops when the limit is already high (as on Atomic).

Related product discussion/links

  • Escalation: p1782814612162629-slack-C02FMH4G8 (dotcom-escalations — "Memory fatal errors increasing in wpcomsh")

Does this pull request change what data or activity we track or use?

No new data is collected. In the rare fallback case (OOM where memory_limit can't be raised), our custom wpcomsh_fatal_signature logstash event does not fire — but the actual fatal is still logged to logstash by PHP regardless, and for an OOM that signature would be unreliable anyway (the error file is "wherever the next allocation landed", not the real cause).

Testing instructions

On an Atomic / wpcomsh test site (or local wpcomsh install):

Reproduce the regression (before this change):

  1. Drop a throwaway mu-plugin that exhausts memory:
    add_action( 'init', function () { $x = ''; while ( true ) { $x .= str_repeat( 'a', 1024 * 1024 ); } } );
  2. Load the front page. On trunk you get a blank page, and logstash records the fatal against wpcom-fatal-error/fatal-error-screen.php.

Verify the fix (with this branch):

  1. With the same mu-plugin active, load the front page.
  2. You should now see the full branded screen (not a blank page).
  3. As an admin, confirm the error details still show and the wpcomsh_fatal_signature logstash event still fires.
  4. Confirm the logged PHP fatal points at the real offending file, not fatal-error-screen.php.

Regression check — non-OOM fatals are unchanged:

  1. Swap the mu-plugin for a normal fatal, e.g. add_action( 'init', function () { undefined_function_xyz(); } );
  2. Confirm the full branded admin view (suspected-plugin card, recovery link, error details) still renders for admins, and the short apology for anonymous visitors.

@arthur791004 arthur791004 added the [Status] Needs Review This PR is ready for review. label Jun 30, 2026
@arthur791004 arthur791004 self-assigned this Jun 30, 2026
@arthur791004 arthur791004 marked this pull request as draft June 30, 2026 16:13
@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Are you an Automattician? Please test your changes on all WordPress.com environments to help mitigate accidental explosions.

  • To test on WoA, go to the Plugins menu on a WoA dev site. Click on the "Upload" button and follow the upgrade flow to be able to upload, install, and activate the Jetpack Beta plugin. Once the plugin is active, go to Jetpack > Jetpack Beta, select your plugin (WordPress.com Site Helper), and enable the fix/fatal-error-screen-oom-cascade branch.

Interested in more tips and information?

  • In your local development environment, use the jetpack rsync command to sync your changes to a WoA dev blog.
  • Read more about our development workflow here: PCYsg-eg0-p2
  • Figure out when your changes will be shipped to customers here: PCYsg-eg5-p2

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Thank you for your PR!

When contributing to Jetpack, we have a few suggestions that can help us test and review your patch:

  • ✅ Include a description of your PR changes.
  • ✅ Add a "[Status]" label (In Progress, Needs Review, ...).
  • ✅ Add testing instructions.
  • ✅ Specify whether this PR includes any changes to data or privacy.
  • ✅ Add changelog entries to affected projects

This comment will be updated as you work on your PR and make changes. If you think that some of those checks are not needed for your PR, please explain why you think so. Thanks for cooperation 🤖


Follow this PR Review Process:

  1. Ensure all required checks appearing at the bottom of this PR are passing.
  2. Make sure to test your changes on all platforms that it applies to. You're responsible for the quality of the code you ship.
  3. You can use GitHub's Reviewers functionality to request a review.
  4. When it's reviewed and merged, you will be pinged in Slack to deploy the changes to WordPress.com simple once the build is done.

If you have questions about anything, reach out in #jetpack-developers for guidance!


Wpcomsh plugin:

  • Next scheduled release: Atomic deploys happen twice daily on weekdays (p9o2xV-2EN-p2)

If you have any questions about the release process, please ask in the #jetpack-releases channel on Slack.

@jp-launch-control

Copy link
Copy Markdown

Code Coverage Summary

This PR did not change code coverage!

That could be good or bad, depending on the situation. Everything covered before, and still is? Great! Nothing was covered before? Not so great. 🤷

Full summary · PHP report

…errors

When the underlying fatal is a PHP OOM, the request is already at its memory
ceiling. Building the branded screen (textdomain load, plugin-header reads,
user bootstrap, output buffering) needs more memory than remains, so it
re-fatals mid-render: the visitor gets a blank page, the re-fatal is logged
against fatal-error-screen.php (masking the real culprit in logstash), and a
PHP OOM is not a catchable Throwable so the render can't be guarded.

On OOM, raise memory_limit a bounded amount for headroom, then run the normal
path unchanged so the admin still sees the error and the logstash event still
fires. If the platform refuses the raise, return core's default screen (PHP
still logs the fatal) instead of blanking out. A direct ini_set is used rather
than wp_raise_memory_limit(), which runs filters in the fatal path and no-ops
when the limit is already high (as on Atomic).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@arthur791004 arthur791004 force-pushed the fix/fatal-error-screen-oom-cascade branch from 93b7b35 to ccbdc27 Compare July 1, 2026 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant