fix: mark stake transactions correctly when they land more than one block after broadcast by Alann27 · Pull Request #278 · pokt-network/igniter

Alann27 · 2026-04-20T18:35:04Z

Summary

Fixes a race condition where stake transactions that landed two or more blocks after broadcast were being marked as failure in the middleman DB even though the chain accepted them.
Replaces the one-shot post-waitForNextBlock check with a Temporal activity-retry-driven poll across the full TX_EXPIRATION_BLOCKS window (~30 min at pocket's ~1-min block time)
Makes unexpected errors (RPC outages, bugs) fail the workflow loudly so the tx stays Pending for human triage instead of being silently converted to Failure

Root cause

broadcastTxSync confirms mempool acceptance, not block inclusion — a tx may land one, two, or more blocks later depending on backlog. The previous flow did:

getBlockHeight() → broadcast → waitForNextBlock(height) (waits for height + 1)
verifyTransaction(hash, height) — one shot across Tiers 1–4

When the tx landed at height + 2 (or later), every tier failed at height + 1:

Tier 1 (tx_index) hadn't caught up
Tier 2 (REST) still returned 404
Tier 3 (block scan) iterated 30 heights but every comet.block(h) past chain head threw and silently continued
Tier 4 (supplier state) didn't find the supplier on-chain yet (it landed at height + 2 too)

Resulting DB write: status=failure, code=∅, log=∅, consumedFee=0 — exactly what the user reported.

What changed

apps/middleman-workflows/src/activities/index.ts

verifyTransaction now throws ApplicationFailure.retryable('Transaction not found on-chain', 'TX_NOT_FOUND', …) when the hash is not on-chain, returning [success, code, gasUsed] only when found
New checkSupplierOnChain(operatorAddress) activity for the Tier 4 positive-only fallback, extracted out of verifyTransaction

apps/middleman-workflows/src/workflows/ExecuteTransaction.ts

verifyTransaction moved to its own proxyActivities group with { initialInterval: "60s", backoffCoefficient: 1, maximumAttempts: TX_EXPIRATION_BLOCKS } — one attempt per ~block for the full expiration window
try/catch around the activity; isTxNotFoundFailure(err) helper discriminates retries-exhausted (ApplicationFailure.type === 'TX_NOT_FOUND') from any other failure, which is rethrown so the workflow fails and the tx stays Pending
Tier 4 (checkSupplierOnChain) runs only after TX_NOT_FOUND retries are exhausted; a positive result short-circuits to success, a missing supplier stays Failure
code is now persisted on every verification update, and log distinguishes: chain rejection (verification failed with code N), supplier-state fallback success, and true TX_NOT_FOUND after retries

apps/middleman-workflows/src/lib/blockchain/index.ts

Tier 3 block scan capped at the current chain head (Math.min(height + maxBlocks, latestHeight)); future heights no longer consume the scan budget by throwing and continue-ing. The workflow-level retry policy handles waiting for new blocks.

Behavior matrix

Scenario	DB result
Tx found, chain code 0	`Success`
Tx found, chain code ≠ 0	`Failure`, `log = "verification failed with code N"`
Tx not found after 30 retries, supplier on-chain	`Success`, `log = "verified via supplier state fallback"`
Tx not found after 30 retries, supplier not on-chain	`Failure`, `log = "tx not found on-chain after 30 retries"`
RPC unreachable / unexpected error anywhere in verify	Workflow fails; tx marked as `Failure`

…dress (#274) * Added migrate button in keys page and key detail to enable providers migrate keys to another address group. * Added address group migration workflow * added confirmation step to bulk migrate modal, added success step to individual migrate key, and some other enhancements * refactored `migrateKeysToAddressGroup` and showing migrate to group button for all keys * Fix sidebar and overlay pages not accounting for notification banner height

…ncy (#276) - Remove transactionsToNodes relation from getNodesByUser to fix PostgreSQL statement timeout that crashed the overview and import suppliers pages - Load supplier transactions on demand in the detail panel via GetNode, skipping the fetch when transactions are already provided by the caller - Replace lastUpdatedHeight for the suppliers table Height column instead of deriving it from stake transactions; add info tooltip explaining the field - Add csvHeader field to CsvColumnDef so JSX column headers export correctly - Fetch user nodes once in the unstake page and pass data down to step components, eliminating redundant useQuery calls across steps - Add copyToClipboard utility with execCommand fallback for non-HTTPS contexts and migrate all navigator.clipboard.writeText usages to it

…lock after broadcast - Poll verifyTransaction across the full TX_EXPIRATION_BLOCKS window via Temporal's activity retry policy (one attempt per ~block) instead of a single post-waitForNextBlock check, so txs that land two or more blocks after broadcast are no longer mis-marked as failure - Make verifyTransaction throw retriable TX_NOT_FOUND when the hash is not on-chain, and split the supplier-state fallback into a dedicated checkSupplierOnChain activity that only runs after retries are exhausted (positive-only: missing supplier stays Failure) - Discriminate the verify catch with isTxNotFoundFailure so RPC outages and other unexpected errors fail the workflow loudly (tx stays Pending) rather than being silently converted to Failure on insufficient evidence - Cap the Tier 3 block scan at the current chain head so future heights stop consuming the scan budget; the workflow-level retry drives forward progress instead - Persist code on every verification update and write a log that distinguishes chain rejection ("verification failed with code N"), supplier fallback success, and true TX_NOT_FOUND after retries

jorgecuesta · 2026-04-20T20:02:16Z

Workflow fail, then what? will retry?

Alann27 · 2026-04-20T21:24:04Z

Workflow fail, then what? will retry?

@jorgecuesta since it remains in a Pending state, it will be retried in the next workflow run. If an unhandled error persists, it will continue retrying indefinitely. We may want to revise this behavior—for example, by marking the transaction as Failure after it has been retried a certain number of times.

jorgecuesta · 2026-04-20T21:34:43Z

@Alann27 I think yes, we should mark as failed after all those retries+supplier check.

jorgecuesta

Did a thorough pass on the tx-landing fix and the migrate-keys flow. Inline comments on what I'd consider blocking.

jorgecuesta · 2026-05-27T20:21:29Z

@@ -127,20 +139,58 @@ export async function ExecuteTransaction(args: TransactionArgs) {
    await waitForNextBlock(txHeight);


The txHeight passed here comes from line 78 (const txHeight = await getBlockHeight()), which is captured BEFORE executeTransaction. If the chain advances during the broadcast — perfectly possible under load with variable block time — txHeight is stale.

waitForNextBlock(txHeight) checks currentHeight >= txHeight + 1. Since we already advanced, it returns immediately without waiting — exactly the wait the rest of the fix relies on. verifyTransaction then goes to look for the tx in the same block where it was included, where the tx_index probably hasn't indexed it yet.

The retries with 60s backoff eventually recover, but we're burning minutes of overhead that waitForNextBlock was meant to avoid.

Fix: re-capture the height post-broadcast before calling waitForNextBlock, or wait for txHeight + 2 to have a one-block margin.

jorgecuesta · 2026-05-27T20:21:29Z

+  } catch {
+    if (operatorAddress) {
+      try {
+        const supplierExists = await checkSupplierOnChain(operatorAddress);


checkSupplierOnChain only checks whether a supplier exists for the operator address — it doesn't validate ownership. Since this path is the fallback after verifyTransaction exhausts its retries (30 attempts × 60s ≈ 30min), we land here precisely in suspicious scenarios: tx was never found on-chain via hash nor block scan.

Failure case: another operator (or the same owner via a parallel route) stakes the same operator address while our original tx was lost or failed. Tier 4 sees a supplier exists with that operator and returns true. The workflow then marks the tx as Success, code = 0, and fires createNewNodesFromTransaction + notifyProviderOfStakedAddresses below, creating node records in our DB that point to a supplier we don't own.

Minimum: have checkSupplierOnChain receive the expected ownerAddress and validate supplier.ownerAddress === expectedOwner before returning true. If owner doesn't match, return false (or an explicit differentOwner signal) and leave the tx as Failure.

jorgecuesta · 2026-05-27T20:21:29Z

+      parsed.targetAddressGroupId,
+    )
+
+    await TriggerAddressGroupMigrationSchedule()


The await TriggerAddressGroupMigrationSchedule() doesn't inspect the ActionResult. If the schedule doesn't exist (fresh instance), Temporal is down, or the trigger returns { success: false, error: ... }, this action still returns success: true with the migrated count. The operator sees "X migrated" and assumes the remediation flow is running, but the keys sit in the new address group with nothing to re-stake them.

const trigger = await TriggerAddressGroupMigrationSchedule() if (!trigger.success) { throw new Error(trigger.error.message) }

Even better: move the trigger inside the same DB transaction so that if the schedule fails, the address group change doesn't stick.

jorgecuesta · 2026-05-27T20:21:29Z

+
+    for (const chunkIds of chunks) {
+      const keys = await tx
+        .select({ id: keysTable.id, state: keysTable.state, remediationHistory: keysTable.remediationHistory })


The SELECT against keysTable inside the transaction doesn't take a lock (FOR UPDATE). The CASE expression built in historyCase embeds the current remediationHistory of each key and then writes it back.

If two migrations run in parallel over overlapping IDs (Temporal worker handling two distinct workflows, the operator clicking "Migrate" twice quickly, or the schedule re-firing before the previous run finishes), both read the same history snapshot, compute their CASE expression, and the second write clobbers the first. A history entry is lost silently.

Fix: .for('update') on the select.

jorgecuesta · 2026-05-27T20:21:29Z

+      if (stateResetIds.length > 0) {
+        await tx
+          .update(keysTable)
+          .set({ state: KeyState.Staked })


This second UPDATE only filters by inArray(id, stateResetIds), not by state. stateResetIds was computed from the in-memory snapshot of the prior SELECT — there's no guarantee the state is still AttentionNeeded/RemediationFailed at the time of this UPDATE.

If a concurrent workflow (SupplierStatus, SupplierRemediation) advanced the state of one of those keys between the SELECT and this UPDATE, you blindly reset it back to Staked, which can revert a legitimate transition (e.g. an in-flight Unstake).

Fix: add the guard to the WHERE:

.where(and( inArray(keysTable.id, stateResetIds), inArray(keysTable.state, STATES_REQUIRING_RESET), ))

Combined with FOR UPDATE on the SELECT above it's fully covered.

Alann27 and others added 5 commits April 8, 2026 23:00

chore(deploy): update staging image to 4506d47

bdf1206

chore(deploy): update staging image to f59bab8

b9837dd

Alann27 requested a review from jorgecuesta April 20, 2026 18:35

Alann27 self-assigned this Apr 20, 2026

Alann27 added bug Something isn't working enhancement New feature or request labels Apr 20, 2026

marking transaction as failure if we couldn't verify it after 30 blocks

a29e77d

github-actions Bot force-pushed the staging branch from d999d44 to b269662 Compare May 15, 2026 02:44

jorgecuesta reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: mark stake transactions correctly when they land more than one block after broadcast#278

fix: mark stake transactions correctly when they land more than one block after broadcast#278
Alann27 wants to merge 6 commits into
stagingfrom
bug/fix-transactions-wait

Alann27 commented Apr 20, 2026 •

edited

Loading

Uh oh!

jorgecuesta commented Apr 20, 2026

Uh oh!

Alann27 commented Apr 20, 2026 •

edited

Loading

Uh oh!

jorgecuesta commented Apr 20, 2026

Uh oh!

jorgecuesta left a comment

Uh oh!

jorgecuesta May 27, 2026

Uh oh!

jorgecuesta May 27, 2026

Uh oh!

jorgecuesta May 27, 2026

Uh oh!

jorgecuesta May 27, 2026

Uh oh!

jorgecuesta May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -127,20 +139,58 @@ export async function ExecuteTransaction(args: TransactionArgs) {
		await waitForNextBlock(txHeight);

Conversation

Alann27 commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

What changed

Behavior matrix

Uh oh!

jorgecuesta commented Apr 20, 2026

Uh oh!

Alann27 commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorgecuesta commented Apr 20, 2026

Uh oh!

jorgecuesta left a comment

Choose a reason for hiding this comment

Uh oh!

jorgecuesta May 27, 2026

Choose a reason for hiding this comment

Uh oh!

jorgecuesta May 27, 2026

Choose a reason for hiding this comment

Uh oh!

jorgecuesta May 27, 2026

Choose a reason for hiding this comment

Uh oh!

jorgecuesta May 27, 2026

Choose a reason for hiding this comment

Uh oh!

jorgecuesta May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Alann27 commented Apr 20, 2026 •

edited

Loading

Alann27 commented Apr 20, 2026 •

edited

Loading