-
Couldn't load subscription status.
- Fork 325
NoSQL: Node IDs - API, SPI + general implementation #2728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...tence/nosql/nodes/impl/src/main/java/org/apache/polaris/nodeids/impl/NodeManagementImpl.java
Show resolved
Hide resolved
...istence/nosql/nodes/impl/src/main/java/org/apache/polaris/nodes/impl/NodeManagementImpl.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on it! Left some comments. Given this is a big change(23 new files and 3 new modules), is it worth to have a dev list discussion? So that people are aware of the changes and contribute their ideas.
| Some ID generation mechanisms, | ||
| like [Snowflake-IDs](https://medium.com/@jitenderkmr/demystifying-snowflake-ids-a-unique-identifier-in-distributed-computing-72796a827c9d), | ||
| require unique integer IDs for each running node. This framework provides a mechanism to assign each running node a | ||
| unique integer ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If snowflake id generator requires such complex node id generator, maybe we should consider other options. Would it possible to use other id generators? Since we are in the persistence module already, why cannot we use something like ObjectID in mongoDB, or Java UUID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The snowflake id generator is already used by the NoSQL persistence impl., of which this PR is just a sub-component.
| * `polaris-nodes-api` provides the necessary Java interfaces and immutable types. | ||
| * `polaris-nodes-impl` provides the storage agnostic implementation. | ||
| * `polaris-nodes-spi` provides the necessary interfaces to provide a storage specific implementation. | ||
| * `polaris-nodes-store-nosql` provides the storage implementation based on `polaris-persistence-nosql-api`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it's in the end-to-end NoSQL PR: #1189 ... to be made available for review later (to allow for smaller, easier-to-review PRs, as discussed)
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
| package org.apache.polaris.nodes.api; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think anywhere else in Polaris needs this. Can we rename it to org.apache.polaris.nosql.nodes.api or org.apache.polaris.nosql.snowflakeid.nodes.api?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true that this PR adds code that in meant to support Snowflake ID generators.
Proposal: Align with existing ID Gen code in main.
package org.apache.polaris.ids.nodes.*- Location:
persistence/nosql/idgen/nodes/...
@snazy @flyrain @dennishuo WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the main API interface mentions API to lease node IDs..., the use cases are not limited to that sole use case. Other IMHO interesting use cases are to get an overview of the active processes (= nodes) of a single Polaris cluster. Adding too specific use cases or even specific call-sites to the package name(s) feels like restricting the use cases.
I'd prefer to keep the current packages names. I'm okay to rename the packages to org.apache.polaris.nodeleases.* or org.apache.polaris.nodeids.* though. But the whole effort isn't user-facing at all, so later renames are possible w/o the risk of breaking anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to nodeids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to get an overview of the active processes (= nodes) of a single Polaris cluster.
Hi @snazy, could you elaborate the use cases of node id generator beyond the snowflake id generator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, not sure I understand your cite of the get an overview of the active processes use case and question about another use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @flyrain : the concrete use case ATM is feeding nodes IDs into the Snowflake ID generator. That is required for the NoSQL persistence to work end-to-end (#1189).
As a side benefit of maintaining a list of active node IDs, one can use that information to report the status of Polaris JVMs that allocate those node IDs. However, this is completely at the discretion of downstream projects that include Polaris libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, not sure I understand your cite of the get an overview of the active processes use case and question about another use case.
The citation probably doesn't quite matter. I was trying to understand the node id generator use cases beyond snowflake id.
As a side benefit of maintaining a list of active node IDs, one can use that information to report the status of Polaris JVMs that allocate those node IDs. However, this is completely at the discretion of downstream projects that include Polaris libraries.
Thanks, Dmitri! This feels more like a K8s-level concern rather than something at the application level (referring to the Polaris service). Could you shed some light on how downstream projects make use of these node IDs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only use case that I know of is the Snowflake IDs. I mentioned downstream together with "can", I did not mean to imply that such downstream projects already exist ATM :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I agree with @flyrain about starting with more constrained package names where possible, not because we're necessarily implying that the concepts within the package can't be useful in other use cases, but because it's best to be more "deliberate" when adopting the libraries into those other use cases, where we'll be able to better assess the suitability of which aspects constitute a stable SPI, whether there are pitfalls to document better, etc.
I do think the nodeids package name is at least an improvement over the more general nodes at this stage though, so maybe that's enough for now.
| * `polaris-nodes-api` provides the necessary Java interfaces and immutable types. | ||
| * `polaris-nodes-impl` provides the storage agnostic implementation. | ||
| * `polaris-nodes-spi` provides the necessary interfaces to provide a storage specific implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These modules are used by snowflake id generator only, can we merge it into the modules holding snowflake id generators? So that the snowflake id generator is more consistent and self-contained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a valid point 👍 I made a specific renaming proposal in the thread about the package name (above).
This PR provides a mechanism to assign a Polaris-cluster-wide unique node-ID to each Polaris instance, which is then used when generating Polaris-cluster-wide unique Snowflake-IDs. The change is fundamental for the NoSQL work, but also demanded for the existing relational JDBC persistence. Does not include any persistence specific implementation.
Also move the expensive part to a `@PostConstruct` to not block CDI entirely from initializing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree nodeids is an improvement over nodes as a package name, and I'm okay with moving forward with this PR as-is for now to unblock further work, though my top preference would've still been to constrain to a nosql package name initially, then if there are non-nosql use cases we can always move into a more general package name along with discussion about deeper documentation preferences as it comes.
We can maybe better come up with standard guidance within our related "SPI" discussion - to me, package names constitute some degree of "prescriptive" scoping of shared code, in contrast to the separation of compilation modules being more "descriptive" in nature. So it's more about what we're communicating to (especially, new) developers trying to find their way around the codebase than any pure technical consideration.
And in that vein it's always easier to start more constrained and make it more open as needed rather than the other way around.
The two sides of the coin for commitment to SPIs are that we can provide better stability and broad generalization of usage of core SPI packages by being selective in avoiding premature generalization.
Following up on apache#2728 this change moves "nodeids" code to the `org.apache.polaris.nosql.nodeids` package.
|
@dennishuo @flyrain : Follow package rename PR: #2931 |
This PR provides a mechanism to assign a Polaris-cluster-wide unique node-ID to each Polaris instance, which is then used when generating Polaris-cluster-wide unique Snowflake-IDs.
The change is fundamental for the NoSQL work, but also demanded for the existing relational JDBC persistence.
Does not include any persistence specific implementation.