-
Notifications
You must be signed in to change notification settings - Fork 572
Description
I want to parse pdf document with table.
I have got text and its coordinates with getDataTm(). I have expected to define limits of x coordinate where the columns should be and that it will solve all my problems.
Unfortunately, I got some confusing values for coordinates, I have tried to find out what is happening but without success.
I have noted two anomalies. The first is the values for row numbers in the first column:
50 331 1
50 298 2
796 42 3
Visually the numbers are one above the other. Also I have to mention that the page is landscape and $details['MediaBox'] are 842.25 and 595.5 . I have noticed that 796+50 ~ 842 and that approximate row high is ~35 for all other cells, so is it possible that the reference point has been changed to the right bottom of the table?
Second mystery is the x coordinate of the last column for I got values:
396 367 16.12.2024
396 333 16.12.2024
396 299 16.12.2024
The problem is that those x values are in the middle of the table. There are columns with greater x value that are left from the mentioned column.
My question is: Is there some math that I have missed, and is it possible that the coordinates do not use the same reference system for the whole document?