Skip to content

Second try to fix producer test intermittency#921

Merged
brandur merged 1 commit into
masterfrom
brandur-fix-producer-intermittency
May 24, 2025
Merged

Second try to fix producer test intermittency#921
brandur merged 1 commit into
masterfrom
brandur-fix-producer-intermittency

Conversation

@brandur

@brandur brandur commented May 23, 2025

Copy link
Copy Markdown
Contributor

Last week I tried to fix an extraneous producer log on shutdown in #896.
Unfortunately it still happens, and worse yet, it actually leads to test
failures if the logging happens during an example test where it gets
recorded (example [1]).

--- FAIL: Example_encryptHook (0.39s)
got:
Secret message: This message is encrypted in the database, but plaintext in workers.
producer: Producer status update, error updating in database
want:
Secret message: This message is encrypted in the database, but plaintext in workers.
FAIL
FAIL    riverqueue.com/riverpro/riverencrypt    0.556s

Looking over this code again, I think the problem was the addition of
this cluster of lines:

var subroutineWG sync.WaitGroup
subroutineWG.Add(3)
subroutineCtx, cancelSubroutines := context.WithCancelCause(context.WithoutCancel(fetchCtx))
go p.heartbeatLogLoop(subroutineCtx, &subroutineWG)
go p.reportQueueStatusLoop(subroutineCtx, &subroutineWG)
go p.reportProducerStatusLoop(subroutineCtx, &subroutineWG)

Subroutines had been set up to not produce an error when they notice a
start/stop's startstop.ErrStop stopping error, but with the new
subroutineCtx and use of WithoutCancel, ErrStop will never make it
down to those goroutines. They don't know not to log on a context
cancellation, so errors are produced.

Here, try to address the problem by making sure there's an ErrStop in
the cancel error:

cancelSubroutines(fmt.Errorf("producer stopped: %w", startstop.ErrStop))

The checks to know whether to log an error are using errors.Is and
context.Cause, so this should work:

if err != nil && errors.Is(context.Cause(ctx), startstop.ErrStop) {
    return
}

[1] https://github.com/riverqueue/riverpro/actions/runs/15047355176/job/42293190805

Last week I tried to fix an extraneous producer log on shutdown in #896.
Unfortunately it still happens, and worse yet, it actually leads to test
failures if the logging happens during an example test where it gets
recorded (example [1]).

    --- FAIL: Example_encryptHook (0.39s)
    got:
    Secret message: This message is encrypted in the database, but plaintext in workers.
    producer: Producer status update, error updating in database
    want:
    Secret message: This message is encrypted in the database, but plaintext in workers.
    FAIL
    FAIL    riverqueue.com/riverpro/riverencrypt    0.556s

Looking over this code again, I think the problem was the addition of
this cluster of lines:

    var subroutineWG sync.WaitGroup
    subroutineWG.Add(3)
    subroutineCtx, cancelSubroutines := context.WithCancelCause(context.WithoutCancel(fetchCtx))
    go p.heartbeatLogLoop(subroutineCtx, &subroutineWG)
    go p.reportQueueStatusLoop(subroutineCtx, &subroutineWG)
    go p.reportProducerStatusLoop(subroutineCtx, &subroutineWG)

Subroutines had been set up to not produce an error when they notice a
start/stop's `startstop.ErrStop` stopping error, but with the new
`subroutineCtx` and use of `WithoutCancel`, `ErrStop` will never make it
down to those goroutines. They don't know not to log on a context
cancellation, so errors are produced.

Here, try to address the problem by making sure there's an `ErrStop` in
the cancel error:

    cancelSubroutines(fmt.Errorf("producer stopped: %w", startstop.ErrStop))

The checks to know whether to log an error are using `errors.Is` and
`context.Cause`, so this should work:

    if err != nil && errors.Is(context.Cause(ctx), startstop.ErrStop) {
        return
    }

[1] https://github.com/riverqueue/riverpro/actions/runs/15047355176/job/42293190805
@brandur brandur requested a review from bgentry May 23, 2025 07:33

@bgentry bgentry left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, hopefully that will take care of it!

@brandur brandur merged commit e5eed9d into master May 24, 2025
10 checks passed
@brandur brandur deleted the brandur-fix-producer-intermittency branch May 24, 2025 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants