Add case for 'endobj' in PDF parsing#784
Conversation
Fixes an issue with parsing some PDF documents which contain this instruction. I have no idea what it means.
|
Thank you for your PR. Is it still work in progress? If not, there are a few tasks left to solve before I take a closer look. Please read https://github.com/smalot/pdfparser/blob/master/CONTRIBUTING.md for more information. |
|
@k00ni It's no longer WIP. I assume you mean the part about needing a test? I did look in the test cases but didn't find any current tests that deal with the "skip unknowns" behavior in the parser class, so I wasn't sure if the policy would be to allow it w/o test, or require a test to be written for just that one thing (same as other |
There was a problem hiding this comment.
Changing code without tests that cover the changes might lead to unintended side effects. We therefore only allow code changes without tests under rare circumstances.
But lets take a step back and get an overview about the underlying issue:
- I assume you receive an
Exception(Invalid type) without this code change? - Can you provide a PDF which triggers this problem and is free of charge so we can use it in the test suite?
- If you are not able to produce/provide a usable PDF (e.g. because it contains sensitive data) we have to think about a basic test case which triggers your code.
Don't get me wrong, I know that a faulty library can disrupt a project in a significant way but just because a PDF fails to parse doesn't mean the library needs a change. It might be because of the PDF itself, the library or somewhere in between. I hope we can find a way to "harden" the library but also add at least 1 test to cover the change.
|
Closed due to inactivity from the author. |
Fixes an issue with parsing some PDF documents which contain this instruction. I have no idea what it means.
Type of pull request
About
I am working on parsing a HUGE library of PDF files and some of those files contain an
endobjheader elementwhich isn't handled by the parser, and this results in an exception. The files do parse correctly if this header
element is simply ignored. It doesn't seem to have any practical use.
Checklist for code / configuration changes
See CONTRIBUTING.md for all essential information about contributing.