Remove `ThreadSafeVarInfo` #1023

penelopeysm · 2025-08-17T18:54:53Z

Summary

This PR removes ThreadSafeVarInfo.

In its place, a @pobserve macro is added to enable multithreaded tilde-observe statements, according to the plan outlined in #924 (comment). Broadly speaking, the following

@model function f(x)
    a ~ Normal()
    @pobserve for i in eachindex(x)
        b = my_fancy_calculation(a)
        x[i] ~ Normal(b)
    end
end

is converted into (modulo variable names)

@model function f(x)
    a ~ Normal()
    thread_results = map(eachindex(x)) do i
        Threads.@spawn begin
            loglike = zero(DynamicPPL.getloglikelihood(__varinfo__))
            b = my_fancy_calculation(a)
            loglike += Distributions.logpdf(Normal(b), x[i])
            loglike
        end
    end
    __varinfo__ = DynamicPPL.accloglikelihood!!(__varinfo__, sum(fetch.(thread_results)))
end

No actual varinfo manipulation happens inside the Threads.@spawn: instead, the log-likelihood contributions are calculated in each thread, then summed after the individual threads have finished their tasks. Because of this, there is no need to maintain one log-likelihood accumulator per thread, and consequently no need for ThreadSafeVarInfo.

Closes #429.
Closes #924.
Closes #947.

Why?

Code simplification in DynamicPPL, and reducing the number of AbstractVarInfo subtypes, is obviously a big argument.

But in fact, that's not my main motivation. I'm mostly motivated to do this because TSVI in general is IMO not good code: it works, but in many ways it's a hack.

Any time Julia is launched with more than 1 thread, all models will be executed with TSVI, even if there is no parallelisation within the model itself. This violates a general principle that users should only 'pay for what they need'.
TSVI encourages users to use Threads.@threads for i in x ... end, and then internally we use Threads.threadid() to index into a vector of accumulators. This is now regarded as "incorrect parallel code that contains the possibility of race conditions which can give wrong results". See https://julialang.org/blog/2023/07/PSA-dont-use-threadid/ and https://discourse.julialang.org/t/behavior-of-threads-threads-for-loop/76042.
Furthermore, to determine the requisite length for that vector of accumulators, we currently use Threads.nthreads() * 2 which is a hacky heuristic. The correct solution would be Threads.maxthreadid(), but Mooncake couldn't differentiate through that.
In fact, even the correct solution is not actually correct. Quoting from Julia blog post above, "relying on threadid, nthreads and even maxthreadid [is] perilous. Any code that relies on a specific threadid staying constant, or on a constant number of threads during execution, is bound to be incorrect.".
In general TSVI is difficult to make type-stable. See, e.g., Accumulators, stage 1 #885 (comment).
The choice of whether to use TSVI or not is determined by if Threads.nthreads() > 1, which cannot be determined at compile time. This means that:
- Extra effort is needed to make sure that, not only is TSVI type stable, but both TSVI and non-TSVI branches in evaluate!! must be together type stable.
- Even though Mooncake can't differentiate through multithreaded code, we still need to make sure it's able to differentiate through TSVI, otherwise it can't differentiate through evaluate!!. That's just silly IMO.
- Similar considerations with Enzyme. There have been at least three issues with Enzyme directly caused by this: Turing models with only one parameter assertion error on reverse-mode EnzymeAD/Enzyme.jl#2337, Instruction does not dominate all uses / function failed verification (probably to do with NamedTuples stuff) EnzymeAD/Enzyme.jl#2512, Illegal loop cache type crash in cacheForReverse EnzymeAD/Enzyme.jl#2518

Does this actually work?

This PR has no tests yet, but I ran this locally and the log-likelihood gets accumulated correctly:

julia> using DynamicPPL, Distributions

julia> @model function f(x)
           @pobserve for i in eachindex(x)
               println(Threads.threadid(), "->", x[i])
               x[i] ~ Normal()
           end
           return DynamicPPL.getlogp(__varinfo__)
       end
f (generic function with 2 methods)

julia> f([1.0, 2.0])() # note that this was run with 2 threads
2->2.0
1->1.0
(logprior = 0.0, logjac = 0.0, loglikelihood = -4.337877066409345)

julia> logpdf(Normal(), 1.0) + logpdf(Normal(), 2.0) # for comparison
-4.337877066409345

I can also confirm that the parallelisation is correctly occurring with this model:

using DynamicPPL, Distributions
@model function g(x)
    @pobserve for i in eachindex(x)
        # can't use Base.sleep as that doesn't fully block
        Libc.systemsleep(1.0)
        x[i] ~ Normal()
    end
end
println(Threads.nthreads())
@time g([1.0, 2.0])()

If you run this with 1 thread it takes 2 seconds, and if you run it with 2 threads it takes 1 second.

It also works correctly with MCMCThreads() (with some minor adjustments to Turing.jl for compatibility with this branch). NOTE: Sampling with @pobserve is now fully reproducible, whereas Threads.@threads was not reproducible even when seeded.

using Turing, DynamicPPL, Random
@model function h(y)
    x ~ MvNormal(zeros(length(y)), I)
    @pobserve for i in eachindex(y)
        y[i] ~ Normal(x[i])
    end
end
chn = sample(Xoshiro(468), h([1.0, 2.0, 3.0]), NUTS(), MCMCThreads(), 2000, 4; check_model=false)
describe(chn)

What now?

There are a handful of limitations to this PR. These are the ones I can think of right now:

Fixed ✅ ~~It will crash if the VarInfo used for evaluation does not have a likelihood accumulator.~~
It only works with likelihood terms. This mimics the pre-0.37 behaviour but in principle, 0.37 does allow users to accumulate prior probabilities (or any accumulator) in a thread-safe manner. Of course, they can't do it with tilde statements; they can only do it by calling something like DynamicPPL.acclogprior!!().
It doesn't work with .~ (or maybe it does, I haven't tested, but my guess is that it will bug out)
It doesn't work with conditioned values.
If x is not a model argument or conditioned upon, this will yield wrong results for the typical x = Vector{Float64}(undef, 2); @pobserve for i in eachindex(x); x[i] ~ dist; end as it will naively accumulate logpdf(dist, x[i]) even though this should be an assumption rather than observation
Fixed ✅ ~~There is no way to extract other computations from the threads.~~
Libtask doesn't work with Threads.@spawn, so PG will throw an error with @pobserve.
@pobserve is a bit too unambitious. If one day we make it work with assume, then it will have to be renamed, i.e. a breaking change.

I believe that all of these are either unimportant or can be worked around with some additional macro leg-work:

Not important, nobody is running around evaluating their models with no likelihood accumulator. Not even Turing does this. Also easy enough for us to guard against by wrapping the entire thing in an if/else.
You can still manually calculate log-prior terms in a thread and then do a single acclogprior!! outside the threaded bit.
This is a bit boilerplate-y but otherwise quite straightforward to fix.
This can be fixed by performing the same checks on the tilde lhs that we do in the main model macro.
Same as (4). Note that this is broadly also a problem with non-parallel models (it's just the inverse problem of Derived variables from data on the LHS of tilde #519) and in general Forcing all LHS variables of tilde to be part of model arguments #965 or similar 'static VarInfo' approaches would fix this.
~~This can be fixed easily by changing the macro to return a tuple of (retval, loglike) rather than just loglike.~~
This is actually an improvement over the current behaviour because right now PG silently yields incorrect results with Threads.@threads.
I am more than happy to take ideas for other names.

So for now this should mostly be considered a proof of principle rather than a complete PR.

Finally, note that this PR already removes > 550 lines of code but this is not a full picture of the simplification afforded. For example, I did not remove the split, combine, and convert_eltype methods on accumulators, which I believe can either be removed or simplified once TSVI is removed.

* Implement InitContext * Fix loading order of modules; move `prefix(::Model)` to model.jl * Add tests for InitContext behaviour * inline `rand(::Distributions.Uniform)` Note that, apart from being simpler code, Distributions.Uniform also doesn't allow the lower and upper bounds to be exactly equal (but we might like to keep that option open in DynamicPPL, e.g. if the user wants to initialise all values to the same value in linked space). * Document * Add a test to check that `init!!` doesn't change linking * Fix `push!` for VarNamedVector This should have been changed in #940, but slipped through as the file wasn't listed as one of the changed files. * Add some line breaks Co-authored-by: Markus Hauru <[email protected]> * Add the option of no fallback for ParamsInit * Improve docstrings * typo * `p.default` -> `p.fallback` * Rename `{Prior,Uniform,Params}Init` -> `InitFrom{Prior,Uniform,Params}` --------- Co-authored-by: Markus Hauru <[email protected]>

codecov · 2025-08-17T19:04:40Z

Codecov Report

❌ Patch coverage is 83.88626% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.91%. Comparing base (3db45ee) to head (700ed07).
⚠️ Report is 15 commits behind head on main.

Files with missing lines	Patch %	Lines
src/pobserve_macro.jl	62.79%	16 Missing ⚠️
src/simple_varinfo.jl	50.00%	6 Missing ⚠️
src/test_utils/contexts.jl	83.33%	5 Missing ⚠️
src/model.jl	76.92%	3 Missing ⚠️
src/contexts/init.jl	98.07%	1 Missing ⚠️
src/model_utils.jl	50.00%	1 Missing ⚠️
src/test_utils/model_interface.jl	0.00%	1 Missing ⚠️
src/test_utils/sampler.jl	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1023      +/-   ##
==========================================
- Coverage   82.34%   80.91%   -1.44%     
==========================================
  Files          38       39       +1     
  Lines        3949     3810     -139     
==========================================
- Hits         3252     3083     -169     
- Misses        697      727      +30

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-08-17T19:04:57Z

Benchmark Report for Commit `700ed07`

Computer Information

Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

|                 Model | Dimension |  AD Backend |      VarInfo Type | Linked | Eval Time / Ref Time | AD Time / Eval Time |
|-----------------------|-----------|-------------|-------------------|--------|----------------------|---------------------|
| Simple assume observe |         1 | forwarddiff |             typed |  false |                  8.3 |                 1.5 |
|           Smorgasbord |       201 | forwarddiff |             typed |  false |                667.1 |                43.8 |
|           Smorgasbord |       201 | forwarddiff | simple_namedtuple |   true |                430.9 |                54.2 |
|           Smorgasbord |       201 | forwarddiff |           untyped |   true |               1021.1 |                35.8 |
|           Smorgasbord |       201 | forwarddiff |       simple_dict |   true |               6756.2 |                28.0 |
|           Smorgasbord |       201 | reversediff |             typed |   true |               1030.7 |                40.7 |
|           Smorgasbord |       201 |    mooncake |             typed |   true |               1013.5 |                 4.6 |
|    Loop univariate 1k |      1000 |    mooncake |             typed |   true |               5804.9 |                 4.5 |
|       Multivariate 1k |      1000 |    mooncake |             typed |   true |               1001.8 |                 9.1 |
|   Loop univariate 10k |     10000 |    mooncake |             typed |   true |              65771.6 |                 4.0 |
|      Multivariate 10k |     10000 |    mooncake |             typed |   true |               8937.5 |                 9.9 |
|               Dynamic |        10 |    mooncake |             typed |   true |                129.6 |                11.7 |
|              Submodel |         1 |    mooncake |             typed |   true |                 12.1 |                 5.3 |
|                   LDA |        12 | reversediff |             typed |   true |               1057.8 |                 2.3 |

coveralls · 2025-08-17T19:06:48Z

Pull Request Test Coverage Report for Build 17024844925

Details

9 of 32 (28.13%) changed or added relevant lines in 5 files are covered.
6 unchanged lines in 3 files lost coverage.
Overall coverage decreased (-0.6%) to 81.966%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/pobserve_macro.jl	0	23	0.0%

Files with Coverage Reduction	New Missed Lines	%
src/abstract_varinfo.jl	1	75.42%
src/compiler.jl	1	86.98%
src/varinfo.jl	4	86.21%

Totals
Change from base Build 16942570157:	-0.6%
Covered Lines:	3127
Relevant Lines:	3815

💛 - Coveralls

github-actions · 2025-08-17T19:14:29Z

DynamicPPL.jl documentation for PR #1023 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR1023/

penelopeysm · 2025-08-17T20:15:38Z

As a bonus, this PR completely fixes all Enzyme issues arising from DPPL 0.37. #947

* use `varname_leaves` from AbstractPPL instead * add changelog entry * fix import

…!`, `predict`, `returned`, and `initialize_values` (#984) * Replace `evaluate_and_sample!!` -> `init!!` * Use `ParamsInit` for `predict`; remove `setval_and_resample!` and friends * Use `init!!` for initialisation * Paper over the `Sampling->Init` context stack (pending removal of SamplingContext) * Remove SamplingContext from JETExt to avoid triggering `Sampling->Init` pathway * Remove `predict` on vector of VarInfo * Fix some tests * Remove duplicated test * Simplify context testing * Rename FooInit -> InitFromFoo * Fix JETExt * Fix JETExt properly * Fix tests * Improve comments * Remove duplicated tests * Docstring improvements Co-authored-by: Markus Hauru <[email protected]> * Concretise `chain_sample_to_varname_dict` using chain value type * Clarify testset name * Re-add comment that shouldn't have vanished * Fix stale Requires dep * Fix default_varinfo/initialisation for odd models * Add comment to src/sampler.jl Co-authored-by: Markus Hauru <[email protected]> --------- Co-authored-by: Markus Hauru <[email protected]>

penelopeysm · 2025-09-19T12:08:32Z

Unfortuntately I don't know how to deal with conditioned/fixed variables without a huge amount of faff and macro code duplication 😮‍💨

yebai · 2025-09-19T13:59:46Z

This requires a bit more discussion before we make a commitment -- not entirely sure we should introduce a new macro.

penelopeysm · 2025-09-19T14:02:28Z

yeah, I remember the discussion we had a few meetings ago

penelopeysm · 2025-10-21T23:56:48Z

holy merge conflicts

penelopeysm and others added 6 commits August 8, 2025 11:21

Bump minor version

b1cdc2a

Merge branch 'main' into breaking

f742103

bump benchmarks compat

5a9e9d2

Merge branch 'main' into breaking

f4db67a

add a skeletal changelog

7b55aa3

penelopeysm marked this pull request as draft August 17, 2025 18:55

github-actions bot assigned penelopeysm Aug 17, 2025

penelopeysm force-pushed the py/pobserve branch from f797af7 to 79bedaf Compare August 17, 2025 19:12

penelopeysm changed the title ~~Attempt to remove TSVI~~ Remove ThreadSafeVarInfo Aug 17, 2025

penelopeysm mentioned this pull request Aug 17, 2025

ThreadSafeVarInfo and threadid #924

Open

penelopeysm and others added 6 commits August 31, 2025 00:16

Merge branch 'main' into breaking

2d18ce3

use varname_leaves from AbstractPPL instead (#1030)

c8e5841

* use `varname_leaves` from AbstractPPL instead * add changelog entry * fix import

tidy occurrences of varname_leaves as well (#1031)

fead2a2

Merge branch 'main' into breaking

1e1cd94

Merge branch 'main' into breaking

2ca382e

penelopeysm force-pushed the py/pobserve branch from 79bedaf to 15d662c Compare September 19, 2025 10:35

penelopeysm added 2 commits September 19, 2025 11:52

Replace ThreadSafeVarInfo with @pobserve

d937157

Add changelog

afe3ba2

penelopeysm changed the base branch from main to breaking September 19, 2025 11:52

penelopeysm force-pushed the py/pobserve branch from 15d662c to d017e4b Compare September 19, 2025 11:52

penelopeysm changed the base branch from breaking to main September 19, 2025 11:54

penelopeysm changed the base branch from main to breaking September 19, 2025 11:54

Allow user to return things from inside pobserve

b3b97da

penelopeysm added 2 commits September 19, 2025 12:54

Add some tests

99f5695

Make pobserve work with VarInfos that have no likelihood accumulator

1f7fef3

penelopeysm force-pushed the py/pobserve branch from f4d4fbf to 1f7fef3 Compare September 19, 2025 11:54

fix expr bug

700ed07

Base automatically changed from breaking to main October 21, 2025 17:06

penelopeysm mentioned this pull request Oct 21, 2025

v0.39 #1082

Open

6 tasks

penelopeysm closed this Oct 21, 2025

penelopeysm mentioned this pull request Oct 23, 2025

Remove TSVI or at least make TSVI opt-in #1086

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove `ThreadSafeVarInfo` #1023

Remove `ThreadSafeVarInfo` #1023

Uh oh!

penelopeysm commented Aug 17, 2025 •

edited

Loading

Uh oh!

codecov bot commented Aug 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 17, 2025 •

edited

Loading

Uh oh!

coveralls commented Aug 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 17, 2025

Uh oh!

penelopeysm commented Aug 17, 2025

Uh oh!

penelopeysm commented Sep 19, 2025

Uh oh!

yebai commented Sep 19, 2025

Uh oh!

penelopeysm commented Sep 19, 2025

Uh oh!

penelopeysm commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Remove ThreadSafeVarInfo #1023

Remove ThreadSafeVarInfo #1023

Uh oh!

Conversation

penelopeysm commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why?

Does this actually work?

What now?

Uh oh!

codecov bot commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Report for Commit 700ed07

Computer Information

Benchmark Results

Uh oh!

coveralls commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 17024844925

Details

💛 - Coveralls

Uh oh!

github-actions bot commented Aug 17, 2025

Uh oh!

penelopeysm commented Aug 17, 2025

Uh oh!

penelopeysm commented Sep 19, 2025

Uh oh!

yebai commented Sep 19, 2025

Uh oh!

penelopeysm commented Sep 19, 2025

Uh oh!

penelopeysm commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Remove `ThreadSafeVarInfo` #1023

Remove `ThreadSafeVarInfo` #1023

penelopeysm commented Aug 17, 2025 •

edited

Loading

codecov bot commented Aug 17, 2025 •

edited

Loading

github-actions bot commented Aug 17, 2025 •

edited

Loading

Benchmark Report for Commit `700ed07`

coveralls commented Aug 17, 2025 •

edited

Loading