Skip to content

[Subtask]: Supports forwarding OpenAPI requests to the master node in master-slave mode.#3969

Open
wardlican wants to merge 20 commits intoapache:masterfrom
wardlican:amoro#3963
Open

[Subtask]: Supports forwarding OpenAPI requests to the master node in master-slave mode.#3969
wardlican wants to merge 20 commits intoapache:masterfrom
wardlican:amoro#3963

Conversation

@wardlican
Copy link
Copy Markdown
Contributor

@wardlican wardlican commented Nov 25, 2025

Why are the changes needed?

In master-slave mode, requests from the slave node's OpenAPI will be automatically forwarded to the master node.

Close #3963.

Brief change log

Add the RequestForwarder class to intercept and forward requests in http.before.

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@github-actions github-actions bot added module:ams-server Ams server module module:ams-optimizer AMS optimizer module module:common labels Nov 25, 2025
@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@amoro.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Dec 26, 2025
@wardlican
Copy link
Copy Markdown
Contributor Author

Please help with the code review.

@github-actions github-actions bot added type:docs Improvements or additions to documentation and removed stale labels Dec 27, 2025
@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@amoro.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jan 30, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 7, 2026

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Feb 7, 2026
@czy006 czy006 reopened this Mar 19, 2026
@github-actions github-actions bot removed the stale label Mar 20, 2026
@zhoujinsong
Copy link
Copy Markdown
Contributor

Thanks for the contribution! After reviewing the current master branch architecture, I think this request forwarding mechanism may no longer be necessary. Here's the reasoning:

Why forwarding is not needed:

  1. All write operations go directly to DBcreateCatalog, dropCatalog, updateCatalog, and table creation all write to the database directly. No in-memory leader-exclusive state is involved. Any node can handle these requests independently.

  2. Optimizing status is fully persisted — Every OptimizingStatus transition is written to table_runtime.status_code synchronously (via StatedPersistentBase.invokeConsistency()). TableController already reads status directly from DB via TableRuntimeMapper.selectRuntime(), not from in-memory tableRuntimeMap. Any node can serve these queries correctly.

  3. Optimizing process history is also DB-backedgetOptimizingProcesses queries optimizing_process table directly. No forwarding needed.

  4. DefaultTableService handles cross-node table sync natively — In master-slave mode, each node periodically syncs its assigned bucket tables via syncBucketTables(). The bucket routing is already handled at the data layer, not the HTTP layer.

The only purely in-memory state is the optimizer heartbeat tracking and in-flight task assignment inside OptimizingQueue, but these are operational monitoring concerns, not correctness-critical for API responses.

Given this, adding an HTTP proxy layer with circuit breakers, retry logic, and exception-as-control-flow introduces significant complexity without a clear benefit. A simpler approach would be to route Dashboard traffic through a load balancer.

Would be happy to discuss further if there are specific cases I missed.

wardli added 15 commits April 6, 2026 09:50
…troduces a new solution for storing allocation information based on a database.
…troduces a new solution for storing allocation information based on a database.
@github-actions github-actions bot removed the module:ams-optimizer AMS optimizer module label Apr 6, 2026
@wardlican
Copy link
Copy Markdown
Contributor Author

Thanks for the contribution! After reviewing the current master branch architecture, I think this request forwarding mechanism may no longer be necessary. Here's the reasoning:

Why forwarding is not needed:

  1. All write operations go directly to DBcreateCatalog, dropCatalog, updateCatalog, and table creation all write to the database directly. No in-memory leader-exclusive state is involved. Any node can handle these requests independently.
  2. Optimizing status is fully persisted — Every OptimizingStatus transition is written to table_runtime.status_code synchronously (via StatedPersistentBase.invokeConsistency()). TableController already reads status directly from DB via TableRuntimeMapper.selectRuntime(), not from in-memory tableRuntimeMap. Any node can serve these queries correctly.
  3. Optimizing process history is also DB-backedgetOptimizingProcesses queries optimizing_process table directly. No forwarding needed.
  4. DefaultTableService handles cross-node table sync natively — In master-slave mode, each node periodically syncs its assigned bucket tables via syncBucketTables(). The bucket routing is already handled at the data layer, not the HTTP layer.

The only purely in-memory state is the optimizer heartbeat tracking and in-flight task assignment inside OptimizingQueue, but these are operational monitoring concerns, not correctness-critical for API responses.

Given this, adding an HTTP proxy layer with circuit breakers, retry logic, and exception-as-control-flow introduces significant complexity without a clear benefit. A simpler approach would be to route Dashboard traffic through a load balancer.

Would be happy to discuss further if there are specific cases I missed.

Thanks for the contribution! After reviewing the current master branch architecture, I think this request forwarding mechanism may no longer be necessary. Here's the reasoning:

Why forwarding is not needed:

  1. All write operations go directly to DBcreateCatalog, dropCatalog, updateCatalog, and table creation all write to the database directly. No in-memory leader-exclusive state is involved. Any node can handle these requests independently.
  2. Optimizing status is fully persisted — Every OptimizingStatus transition is written to table_runtime.status_code synchronously (via StatedPersistentBase.invokeConsistency()). TableController already reads status directly from DB via TableRuntimeMapper.selectRuntime(), not from in-memory tableRuntimeMap. Any node can serve these queries correctly.
  3. Optimizing process history is also DB-backedgetOptimizingProcesses queries optimizing_process table directly. No forwarding needed.
  4. DefaultTableService handles cross-node table sync natively — In master-slave mode, each node periodically syncs its assigned bucket tables via syncBucketTables(). The bucket routing is already handled at the data layer, not the HTTP layer.

The only purely in-memory state is the optimizer heartbeat tracking and in-flight task assignment inside OptimizingQueue, but these are operational monitoring concerns, not correctness-critical for API responses.

Given this, adding an HTTP proxy layer with circuit breakers, retry logic, and exception-as-control-flow introduces significant complexity without a clear benefit. A simpler approach would be to route Dashboard traffic through a load balancer.

Would be happy to discuss further if there are specific cases I missed.

Yes, all current state information is retrieved from the database, and any updated states are also synchronized back to it. Since every AMS node synchronizes its state information from the database, this feature is not required.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 22.59%. Comparing base (fe98a6c) to head (eaf0d76).
⚠️ Report is 10 commits behind head on master.

Files with missing lines Patch % Lines
...che/amoro/exception/RequestForwardedException.java 0.00% 13 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (fe98a6c) and HEAD (eaf0d76). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (fe98a6c) HEAD (eaf0d76)
core 1 0
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3969      +/-   ##
============================================
- Coverage     29.93%   22.59%   -7.34%     
+ Complexity     4229     2601    -1628     
============================================
  Files           675      462     -213     
  Lines         53990    42524   -11466     
  Branches       6838     5993     -845     
============================================
- Hits          16161     9608    -6553     
+ Misses        36636    32079    -4557     
+ Partials       1193      837     -356     
Flag Coverage Δ
core ?
trino 22.59% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@czy006
Copy link
Copy Markdown
Contributor

czy006 commented Apr 6, 2026

This is an enhanced feature and does not affect our usage.

@czy006
Copy link
Copy Markdown
Contributor

czy006 commented Apr 6, 2026

In master-slave mode, startup uses registerAndElect() + startOptimizingService() and does not go through waitLeaderShip().
However, in ZK HA, leader REST endpoint data is written in waitLeaderShip(), while forwarding reads from getLeaderNodeInfo() (table-service master path).
This creates a gap where followers cannot resolve leader REST info and forwarding fails even when the cluster is healthy.

@wardlican
Copy link
Copy Markdown
Contributor Author

In master-slave mode, startup uses registerAndElect() + startOptimizingService() and does not go through waitLeaderShip(). However, in ZK HA, leader REST endpoint data is written in waitLeaderShip(), while forwarding reads from getLeaderNodeInfo() (table-service master path). This creates a gap where followers cannot resolve leader REST info and forwarding fails even when the cluster is healthy.

It seems there was an issue while resolving conflicts during the merge process. I'll check and fix it.

@czy006 czy006 requested a review from xxubai April 7, 2026 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ams-server Ams server module module:common type:docs Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Subtask]: Supports forwarding OpenAPI requests to the master node in master-slave mode.

4 participants