Skip to content

Conversation

@tantaman
Copy link
Contributor

@tantaman tantaman commented Oct 22, 2025

Overview

The current limit algorithm is pretty simple:

  1. Compute selectivity for each connection. This is est_rows_with_filters / est_rows_without_filters
  2. In each semi-join, divide the parent limit by the child selectivity to get the "scan_est"

The intuition here is that if a child is very selective we must iterate more rows on the parent side before finding a match on the child side.

Example:
issue.whereExists('creator').limit(10)

Every issue has a creator so the selectivity of the child is 1. This means we have: scan_est = 10/1 = 10. I.e., we'll have to scan 10 rows before fulfilling our limit of 10.

If it were:
issue.whereExists('creator', q => q.name('ff'))

The cost model will give us the estimated num of rows for a creator with a given name. Maybe it is 10 out of 1,000. This means our selectivity is: 10/1000 -> .01.

.01 selectivity means we have: scan_est = 10/.01 = 1,000 -> we will potentially have to scan 1,000 parent rows before finding 10 matches from the child.

let scanEst = parentCost.baseCardinality;
if (this.#type === 'semi' && parentCost.limit !== undefined) {
  if (childCost.selectivity !== 0) {
    scanEst = Math.min(scanEst, parentCost.limit / childCost.selectivity);
  }
}

Problem

The simple algorithm isn't quite complete. Think of this query:

user.whereExists('comments', q => q.whereCreated(gt(date)))

The selectivity on comments may be high but if users have created many comments, this increases the likelihood of finding a match.

Example

Say we have users 1,2,3,4,5

And a comment table. Below are user ids from the comment table, assuming every user created 2 comments.

[4, 1, 3, 5, 2, 1, 5, 4, 3, 2]

If our filters decimated the table to only half the rows:

[1, 5, 4, 3, 2]

We have 0.5 selectivity so an estimated scan of 1/0.5 = 2 over user to find a match.

But we see that all user ids are present in the set! We only scan 1 user row to find a matching comment. This faster matching is because we have a higher fanout from user->comment than 1.

So we need to do: selectivity = 1 - Math.pow(1 - filterSelectivity, fanout) = 0.75 -> scan_est = 1 / 0.75 = 1.33
1.33 being closer to 1.

More on this is documented in packages/zql/src/planner/SELECTIVITY_PLAN.md

Future notes to self:

  • should we throw out filters that are not indexed by sqlite when computing selectivity?
  • we currently penalize the plan if it creates a temp b-tree. This penalty is not applied in our parentCost.limit / childCost.selectivity algorithm. Postgres has a separate cost which is "startup_cost" that captures things like creating temp indices.
  • consider how join keys impact selectivity. Likely need to recompute selectivity after constraint prop.

@vercel
Copy link

vercel bot commented Oct 22, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
replicache-docs Ready Ready Preview Comment Oct 24, 2025 3:44pm
zbugs Ready Ready Preview Comment Oct 24, 2025 3:44pm

@github-actions
Copy link

github-actions bot commented Oct 22, 2025

🐰 Bencher Report

Branchmlaw/planner-limit
TestbedLinux
Click to view all benchmark results
BenchmarkFile SizeBenchmark Result
kilobytes (KB)
(Result Δ%)
Upper Boundary
kilobytes (KB)
(Limit %)
zero-package.tgz📈 view plot
🚷 view threshold
1,389.43 KB
(0.00%)Baseline: 1,389.43 KB
1,417.22 KB
(98.04%)
zero.js📈 view plot
🚷 view threshold
228.62 KB
(0.00%)Baseline: 228.62 KB
233.19 KB
(98.04%)
zero.js.br📈 view plot
🚷 view threshold
63.70 KB
(0.00%)Baseline: 63.70 KB
64.97 KB
(98.04%)
🐰 View full continuous benchmarking report in Bencher

@github-actions
Copy link

github-actions bot commented Oct 22, 2025

🐰 Bencher Report

Branchmlaw/planner-limit
Testbedself-hosted
Click to view all benchmark results
BenchmarkThroughputBenchmark Result
operations / second (ops/s) x 1e3
(Result Δ%)
Lower Boundary
operations / second (ops/s) x 1e3
(Limit %)
src/client/custom.bench.ts > big schema📈 view plot
🚷 view threshold
905.89 ops/s x 1e3
(+1.66%)Baseline: 891.10 ops/s x 1e3
824.86 ops/s x 1e3
(91.05%)
src/client/zero.bench.ts > basics > All 1000 rows x 10 columns (numbers)📈 view plot
🚷 view threshold
2.97 ops/s x 1e3
(+3.38%)Baseline: 2.88 ops/s x 1e3
2.76 ops/s x 1e3
(92.74%)
src/client/zero.bench.ts > pk compare > pk = N📈 view plot
🚷 view threshold
47.37 ops/s x 1e3
(+4.48%)Baseline: 45.34 ops/s x 1e3
43.11 ops/s x 1e3
(91.01%)
src/client/zero.bench.ts > with filter > Lower rows 500 x 10 columns (numbers)📈 view plot
🚷 view threshold
3.93 ops/s x 1e3
(-5.47%)Baseline: 4.16 ops/s x 1e3
3.89 ops/s x 1e3
(98.87%)
🐰 View full continuous benchmarking report in Bencher

@github-actions
Copy link

github-actions bot commented Oct 22, 2025

🐰 Bencher Report

Branchmlaw/planner-limit
Testbedself-hosted

🚨 7 Alerts

BenchmarkMeasure
Units
ViewBenchmark Result
(Result Δ%)
Lower Boundary
(Limit %)
unplanned: playlist.exists(tracks)Throughput
operations / second (ops/s)
📈 plot
🚷 threshold
🚨 alert (🔔)
916.32 ops/s
(-5.01%)Baseline: 964.60 ops/s
924.74 ops/s
(100.92%)

unplanned: track.exists(album) where title="Big Ones"Throughput
operations / second (ops/s)
📈 plot
🚷 threshold
🚨 alert (🔔)
33.37 ops/s
(-6.09%)Baseline: 35.53 ops/s
33.77 ops/s
(101.21%)

unplanned: track.exists(album).exists(genre)Throughput
operations / second (ops/s)
📈 plot
🚷 threshold
🚨 alert (🔔)
20.80 ops/s
(-4.39%)Baseline: 21.75 ops/s
20.96 ops/s
(100.80%)

zql: edit for limited query, inside the boundThroughput
operations / second (ops/s) x 1e3
📈 plot
🚷 threshold
🚨 alert (🔔)
224.29 ops/s x 1e3
(-4.43%)Baseline: 234.68 ops/s x 1e3
224.52 ops/s x 1e3
(100.10%)

zqlite: (table scan) select * from albumThroughput
operations / second (ops/s) x 1e3
📈 plot
🚷 threshold
🚨 alert (🔔)
1.21 ops/s x 1e3
(-11.25%)Baseline: 1.36 ops/s x 1e3
1.24 ops/s x 1e3
(102.68%)

zqlite: all playlistsThroughput
operations / second (ops/s)
📈 plot
🚷 threshold
🚨 alert (🔔)
1.35 ops/s
(-8.77%)Baseline: 1.48 ops/s
1.37 ops/s
(102.00%)

zqlite: push into unlimited queryThroughput
operations / second (ops/s) x 1e3
📈 plot
🚷 threshold
🚨 alert (🔔)
117.96 ops/s x 1e3
(-10.52%)Baseline: 131.83 ops/s x 1e3
121.50 ops/s x 1e3
(103.00%)

Click to view all benchmark results
BenchmarkThroughputBenchmark Result
operations / second (ops/s)
(Result Δ%)
Lower Boundary
operations / second (ops/s)
(Limit %)
planned: playlist.exists(tracks)📈 view plot
🚷 view threshold
952.28 ops/s
(+898.19%)Baseline: 95.40 ops/s
-428.18 ops/s
(-44.96%)
planned: track.exists(album) OR exists(genre)📈 view plot
🚷 view threshold
22.33 ops/s
(-2.16%)Baseline: 22.83 ops/s
21.93 ops/s
(98.20%)
planned: track.exists(album) where title="Big Ones"📈 view plot
🚷 view threshold
8,934.04 ops/s
(+6.15%)Baseline: 8,416.58 ops/s
7,717.96 ops/s
(86.39%)
planned: track.exists(album).exists(genre)📈 view plot
🚷 view threshold
25.45 ops/s
(-2.56%)Baseline: 26.12 ops/s
25.23 ops/s
(99.14%)
planned: track.exists(album).exists(genre) with filters📈 view plot
🚷 view threshold
5,375.64 ops/s
(-1.46%)Baseline: 5,455.44 ops/s
5,263.10 ops/s
(97.91%)
planned: track.exists(playlists)📈 view plot
🚷 view threshold
6.40 ops/s
(+709.10%)Baseline: 0.79 ops/s
-2.64 ops/s
(-41.19%)
unplanned: playlist.exists(tracks)📈 view plot
🚷 view threshold
🚨 view alert (🔔)
916.32 ops/s
(-5.01%)Baseline: 964.60 ops/s
924.74 ops/s
(100.92%)

unplanned: track.exists(album) OR exists(genre)📈 view plot
🚷 view threshold
22.29 ops/s
(-1.92%)Baseline: 22.72 ops/s
21.84 ops/s
(98.02%)
unplanned: track.exists(album) where title="Big Ones"📈 view plot
🚷 view threshold
🚨 view alert (🔔)
33.37 ops/s
(-6.09%)Baseline: 35.53 ops/s
33.77 ops/s
(101.21%)

unplanned: track.exists(album).exists(genre)📈 view plot
🚷 view threshold
🚨 view alert (🔔)
20.80 ops/s
(-4.39%)Baseline: 21.75 ops/s
20.96 ops/s
(100.80%)

unplanned: track.exists(album).exists(genre) with filters📈 view plot
🚷 view threshold
35.17 ops/s
(-2.22%)Baseline: 35.97 ops/s
34.58 ops/s
(98.31%)
unplanned: track.exists(playlists)📈 view plot
🚷 view threshold
6.36 ops/s
(-1.23%)Baseline: 6.44 ops/s
6.18 ops/s
(97.25%)
zpg: (pk lookup) select * from track where id = 3163📈 view plot
🚷 view threshold
1,141.40 ops/s
(+4.96%)Baseline: 1,087.51 ops/s
894.84 ops/s
(78.40%)
zpg: (secondary index lookup) select * from track where album_id = 248📈 view plot
🚷 view threshold
1,080.43 ops/s
(-4.60%)Baseline: 1,132.47 ops/s
1,048.97 ops/s
(97.09%)
zpg: (table scan) select * from album📈 view plot
🚷 view threshold
715.36 ops/s
(-4.05%)Baseline: 745.54 ops/s
675.06 ops/s
(94.37%)
zpg: OR with empty branch and limit📈 view plot
🚷 view threshold
947.32 ops/s
(+4.16%)Baseline: 909.50 ops/s
730.42 ops/s
(77.10%)
zpg: OR with empty branch and limit with exists📈 view plot
🚷 view threshold
741.48 ops/s
(-4.51%)Baseline: 776.54 ops/s
701.05 ops/s
(94.55%)
zpg: all playlists📈 view plot
🚷 view threshold
5.59 ops/s
(+111.20%)Baseline: 2.65 ops/s
0.85 ops/s
(15.17%)
zpg: scan with one depth related📈 view plot
🚷 view threshold
431.82 ops/s
(+54.75%)Baseline: 279.04 ops/s
180.83 ops/s
(41.88%)
zql: (pk lookup) select * from track where id = 3163📈 view plot
🚷 view threshold
120,766.66 ops/s
(-11.66%)Baseline: 136,703.84 ops/s
117,715.25 ops/s
(97.47%)
zql: (secondary index lookup) select * from track where album_id = 248📈 view plot
🚷 view threshold
2,046.13 ops/s
(-2.07%)Baseline: 2,089.47 ops/s
1,607.18 ops/s
(78.55%)
zql: (table scan) select * from album📈 view plot
🚷 view threshold
6,639.40 ops/s
(-1.66%)Baseline: 6,751.62 ops/s
5,987.38 ops/s
(90.18%)
zql: OR with empty branch and limit📈 view plot
🚷 view threshold
57,344.02 ops/s
(-0.53%)Baseline: 57,651.71 ops/s
48,886.70 ops/s
(85.25%)
zql: OR with empty branch and limit with exists📈 view plot
🚷 view threshold
12,627.06 ops/s
(-1.63%)Baseline: 12,836.70 ops/s
12,048.09 ops/s
(95.41%)
zql: all playlists📈 view plot
🚷 view threshold
4.25 ops/s
(-5.13%)Baseline: 4.48 ops/s
4.22 ops/s
(99.30%)
zql: edit for limited query, inside the bound📈 view plot
🚷 view threshold
🚨 view alert (🔔)
224,287.41 ops/s
(-4.43%)Baseline: 234,682.92 ops/s
224,517.25 ops/s
(100.10%)

zql: edit for limited query, outside the bound📈 view plot
🚷 view threshold
230,140.77 ops/s
(-5.36%)Baseline: 243,183.15 ops/s
221,519.32 ops/s
(96.25%)
zql: push into limited query, inside the bound📈 view plot
🚷 view threshold
108,098.44 ops/s
(-6.51%)Baseline: 115,626.54 ops/s
107,421.94 ops/s
(99.37%)
zql: push into limited query, outside the bound📈 view plot
🚷 view threshold
440,950.32 ops/s
(-4.75%)Baseline: 462,934.62 ops/s
414,006.01 ops/s
(93.89%)
zql: push into unlimited query📈 view plot
🚷 view threshold
330,729.14 ops/s
(-9.44%)Baseline: 365,220.35 ops/s
330,166.87 ops/s
(99.83%)
zql: scan with one depth related📈 view plot
🚷 view threshold
477.23 ops/s
(-2.67%)Baseline: 490.33 ops/s
461.04 ops/s
(96.61%)
zqlite: (pk lookup) select * from track where id = 3163📈 view plot
🚷 view threshold
44,710.12 ops/s
(-1.23%)Baseline: 45,267.34 ops/s
41,979.45 ops/s
(93.89%)
zqlite: (secondary index lookup) select * from track where album_id = 248📈 view plot
🚷 view threshold
10,771.32 ops/s
(-5.61%)Baseline: 11,411.98 ops/s
10,108.04 ops/s
(93.84%)
zqlite: (table scan) select * from album📈 view plot
🚷 view threshold
🚨 view alert (🔔)
1,207.03 ops/s
(-11.25%)Baseline: 1,360.07 ops/s
1,239.37 ops/s
(102.68%)

zqlite: OR with empty branch and limit📈 view plot
🚷 view threshold
19,160.67 ops/s
(+0.75%)Baseline: 19,017.90 ops/s
17,895.97 ops/s
(93.40%)
zqlite: OR with empty branch and limit with exists📈 view plot
🚷 view threshold
5,704.73 ops/s
(-0.07%)Baseline: 5,708.60 ops/s
5,201.32 ops/s
(91.18%)
zqlite: all playlists📈 view plot
🚷 view threshold
🚨 view alert (🔔)
1.35 ops/s
(-8.77%)Baseline: 1.48 ops/s
1.37 ops/s
(102.00%)

zqlite: edit for limited query, inside the bound📈 view plot
🚷 view threshold
121,880.77 ops/s
(-1.45%)Baseline: 123,673.17 ops/s
117,084.73 ops/s
(96.06%)
zqlite: edit for limited query, outside the bound📈 view plot
🚷 view threshold
123,479.29 ops/s
(-3.58%)Baseline: 128,063.58 ops/s
121,495.81 ops/s
(98.39%)
zqlite: push into limited query, inside the bound📈 view plot
🚷 view threshold
4,206.57 ops/s
(-2.35%)Baseline: 4,307.91 ops/s
4,165.04 ops/s
(99.01%)
zqlite: push into limited query, outside the bound📈 view plot
🚷 view threshold
141,060.11 ops/s
(-5.61%)Baseline: 149,444.44 ops/s
140,650.11 ops/s
(99.71%)
zqlite: push into unlimited query📈 view plot
🚷 view threshold
🚨 view alert (🔔)
117,958.85 ops/s
(-10.52%)Baseline: 131,833.20 ops/s
121,501.50 ops/s
(103.00%)

zqlite: scan with one depth related📈 view plot
🚷 view threshold
161.54 ops/s
(-2.00%)Baseline: 164.85 ops/s
151.97 ops/s
(94.07%)
🐰 View full continuous benchmarking report in Bencher

@tantaman
Copy link
Contributor Author

@copilot - what's with the js / check types action failure? Everything passes locally

@tantaman tantaman enabled auto-merge October 24, 2025 15:44
@tantaman tantaman added this pull request to the merge queue Oct 24, 2025
Merged via the queue into main with commit 341dd8f Oct 24, 2025
16 of 19 checks passed
@tantaman tantaman deleted the mlaw/planner-limit branch October 24, 2025 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants