-
Notifications
You must be signed in to change notification settings - Fork 13
Fix: Widget.ParseDa normalize TextDa to handle wrong formatting #205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a bug in the Widget.ParseDa method that occurs when PDF creators incorrectly format the DA (Default Appearance) string with multiple consecutive spaces. The fix normalizes the TextDa string by removing empty entries when splitting, preventing an ArgumentOutOfRangeException during string parsing.
- Updated string splitting logic to handle malformed DA strings with extra whitespace
- Added unit test to verify the fix works with problematic PDFs
Reviewed Changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
MuPDF.NET/Widget.cs | Modified ParseDa method to use StringSplitOptions.RemoveEmptyEntries when splitting TextDa |
MuPDF.NET.Test/WidgetTest.cs | Added test case to verify widget parsing doesn't throw exceptions on malformed PDFs |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
MuPDF.NET/Widget.cs
Outdated
float fontSize = 0; | ||
float[] col = { 0, 0, 0 }; | ||
string[] dat = TextDa.Split(' '); // split on any whitespace | ||
string[] dat = TextDa.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries); // split on any whitespace and remove empty entries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To cover all bases shouldn't this maybe be:
char[] whitespaceChars = { ' ', '\t', '\n', '\r', '\f', '\v' };
string[] dat = TextDa.Split(whitespaceChars, StringSplitOptions.RemoveEmptyEntries); // split on any whitespace and remove empty entries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://stackoverflow.com/questions/6111298/best-way-to-specify-whitespace-in-a-string-split-operation suggests that one can use null
or possibly an empty string to split on any whitespace character. Not sure whether this still needs StringSplitOptions.RemoveEmptyEntries
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice1 @julian-smith-artifex-com !
Reworked the test to prove that, but RemoveEmptyEntries
is still needed
We've encountered following bug on two occasions, seems some "PDF creators" wrongly format the DA
This results in a
/Helv 0 Tf 0 g
instead of a/Helv 0 Tf 0 g
notice the double space after Helv
Added unit test with PDF from https://eforms.com/employment/independent-contractor/service-contract/babysitter/