Skip to content

Parsing table columns #752

@mpele

Description

@mpele

I want to parse pdf document with table.
I have got text and its coordinates with getDataTm(). I have expected to define limits of x coordinate where the columns should be and that it will solve all my problems.

Unfortunately, I got some confusing values for coordinates, I have tried to find out what is happening but without success.

I have noted two anomalies. The first is the values for row numbers in the first column:

50 331 1
50 298 2
796 42 3

Visually the numbers are one above the other. Also I have to mention that the page is landscape and $details['MediaBox'] are 842.25 and 595.5 . I have noticed that 796+50 ~ 842 and that approximate row high is ~35 for all other cells, so is it possible that the reference point has been changed to the right bottom of the table?

Second mystery is the x coordinate of the last column for I got values:

396 367 16.12.2024
396 333 16.12.2024
396 299 16.12.2024

The problem is that those x values are in the middle of the table. There are columns with greater x value that are left from the mentioned column.

My question is: Is there some math that I have missed, and is it possible that the coordinates do not use the same reference system for the whole document?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions