Conversation
|
This PR forces the id/var dir names to adopt a hive partitioning scheme - should it be optional? For the partitions, hive partitioning seems to be enforced/hardcoded at https://github.com/ltelab/tstore/blob/feat-hive-partitioning/tstore/archive/ts/writers/pyarrow.py#L77 |
c29b057 to
d50c06f
Compare
|
I fixed some issues and rebased this to have a proper PR with this feature only. Now we can review it. I wonder: is it worth making the hive scheme optional at this point? I suggest we move forward with hive only. We may consider supporting futher schemes later on. |
|
I will review this tomorrow or Friday @martibosch. But as quick thought I would not enforce Two further considerations. A TS object / partitioned parquet dataset is readable in whatever language-agnostic dataframe/query engine supporting reading parquet file. A TSTORE directory structure with hive partitioning is not readable:
|
|
ok I can make it optional, but I understand that for now we still leave the "hive" time partitioning hardcoded at https://github.com/ltelab/tstore/blob/feat-hive-partitioning/tstore/archive/ts/writers/pyarrow.py#L77 ? |
d50c06f to
a87ea6f
Compare
|
I amended the first commit in order to try to make this work not only for tslong but also for tsdf write/load. |
a87ea6f to
e5f7b83
Compare
|
I have added a second commit drafting what I understand should be the rationale f the |
|
Sorry again for overthinking and for the likely premature optimization, but this is probably a good point to consider whether we need the |
|
Once the above issues are clear we can see how we make the id and var-level hive scheme optional, e.g., allow paths of the form |
Prework
What kind of change does this PR introduce? (check at least one)
Does this PR introduce a breaking change? (check one)
If yes, please describe the impact and communicate accordingly:
The PR fulfills these requirements:
bugfix-<some_key>-<word>doc-<some_key>-<word>tutorial-<some_key>-<word>feature-<some_key>-<word>refactor-<some_key>-<word>optimize-<some_key>-<word>fix #xxx[,#xxx], where "xxx" is the issue number)If adding a new feature, the PR's description includes:
Other information:
Related GitHub issues and pull requests
Summary
Please explain the purpose and scope of your contribution.