Skip to content

Conversation

Adash-nitro
Copy link

We've encountered following bug on two occasions, seems some "PDF creators" wrongly format the DA

This results in a
/Helv 0 Tf 0 g instead of a /Helv 0 Tf 0 g
notice the double space after Helv

Added unit test with PDF from https://eforms.com/employment/independent-contractor/service-contract/babysitter/

System.ArgumentOutOfRangeException: startIndex cannot be larger than length of string. (Parameter 'startIndex')
at System.String.ThrowSubstringArgumentOutOfRange(Int32 startIndex, Int32 length)
at System.String.Substring(Int32 startIndex)
at MuPDF.NET.Widget.ParseDa()
at MuPDF.NET.Utils.GetWidgetProperties(Annot annot, Widget widget)
at MuPDF.NET.Utils.FillWidget(Annot annot, Widget widget)
at MuPDF.NET.Page.LoadWidget(Int32 xref)
at MuPDF.NET.Page.GetWidgets(Int32[] types)+MoveNext()

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug in the Widget.ParseDa method that occurs when PDF creators incorrectly format the DA (Default Appearance) string with multiple consecutive spaces. The fix normalizes the TextDa string by removing empty entries when splitting, preventing an ArgumentOutOfRangeException during string parsing.

  • Updated string splitting logic to handle malformed DA strings with extra whitespace
  • Added unit test to verify the fix works with problematic PDFs

Reviewed Changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

File Description
MuPDF.NET/Widget.cs Modified ParseDa method to use StringSplitOptions.RemoveEmptyEntries when splitting TextDa
MuPDF.NET.Test/WidgetTest.cs Added test case to verify widget parsing doesn't throw exceptions on malformed PDFs

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Collaborator

@MaksymTkachuk MaksymTkachuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

float fontSize = 0;
float[] col = { 0, 0, 0 };
string[] dat = TextDa.Split(' '); // split on any whitespace
string[] dat = TextDa.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries); // split on any whitespace and remove empty entries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To cover all bases shouldn't this maybe be:

char[] whitespaceChars = { ' ', '\t', '\n', '\r', '\f', '\v' };
string[] dat = TextDa.Split(whitespaceChars, StringSplitOptions.RemoveEmptyEntries); // split on any whitespace and remove empty entries

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://stackoverflow.com/questions/6111298/best-way-to-specify-whitespace-in-a-string-split-operation suggests that one can use null or possibly an empty string to split on any whitespace character. Not sure whether this still needs StringSplitOptions.RemoveEmptyEntries.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice1 @julian-smith-artifex-com !
Reworked the test to prove that, but RemoveEmptyEntries is still needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants