Fix memory exhaustion caused by circular references in PDF xref chains#787
Merged
k00ni merged 2 commits intosmalot:masterfrom Jan 8, 2026
Merged
Conversation
#2) * Initial plan * Initial setup: Add crash file and reproducer script Co-authored-by: k00ni <381727+k00ni@users.noreply.github.com> * Fix infinite loop caused by circular xref references Co-authored-by: k00ni <381727+k00ni@users.noreply.github.com> * Add test case for circular xref reference fix Co-authored-by: k00ni <381727+k00ni@users.noreply.github.com> * Address review feedback: Add array<int> type hints and remove test - Added array<int> type hints to $visitedOffsets parameters in PHPDoc comments - Removed testCircularXrefReference test as requested by reviewer - Removed reproducer.php file as requested - Crash file memory-exhaustion-crash-f5d89649.txt remains permanently in samples/bugs/ Co-authored-by: k00ni <381727+k00ni@users.noreply.github.com> * Remove crash file and add comprehensive tests for circular reference fix - Removed memory-exhaustion-crash-f5d89649.txt as requested - Added helper methods to expose protected functions for testing - Added testGetXrefDataPreventsCircularReferences: Tests that visitedOffsets prevents infinite loops - Added testDecodeXrefPassesVisitedOffsets: Tests that decodeXref correctly passes visitedOffsets - Added testGetXrefDataTracksVisitedOffsets: Tests that offsets are tracked properly - All tests use minimal PDF structures and don't rely on external crash file Co-authored-by: k00ni <381727+k00ni@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: k00ni <381727+k00ni@users.noreply.github.com>
Contributor
|
I haven't checked if it solves #71, but the PR code worked fine, without error on my main search engine implementation (464 PDFs). Then again, I'm pretty sure none of those PDFs were of the malformed type to have caused this error. |
Collaborator
Author
You would have noticed 😅 Thank you very much for taking the time. |
Collaborator
Author
|
@j0k3r I had to release https://github.com/smalot/pdfparser/releases/tag/v2.12.3 without asking you because it contains a fix for a Denial of Service vulnerability and I was not sure who else can see the release-draft. I hope it was OK. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Type of pull request
About
This PR fixes xref-handling when it comes to circular references. A malformed PDF can lead to memory exhaustion because PDFParser enters an endless loop at some point. These fixes should prevent that.
It allows me to work with a bunch of "broken" PDFs, but I might have overlook some things. It would be great if someone else could have a look. Any feedback is appreciated (maybe @GreyWyvern @j0k3r?) 🚀