Skip to content

Commit 341dd8f

Browse files
authored
chore(planner): consider limit when planning (#5048)
# Overview The current limit algorithm is pretty simple: 1. Compute selectivity for each connection. This is `est_rows_with_filters / est_rows_without_filters` 2. In each semi-join, divide the parent limit by the child selectivity to get the "scan_est" The intuition here is that if a child is _very selective_ we must iterate more rows on the parent side before finding a match on the child side. **Example:** `issue.whereExists('creator').limit(10)` Every issue has a creator so the selectivity of the child is 1. This means we have: `scan_est = 10/1 = 10`. I.e., we'll have to scan 10 rows before fulfilling our limit of 10. If it were: `issue.whereExists('creator', q => q.name('ff'))` The cost model will give us the estimated num of rows for a creator with a given name. Maybe it is 10 out of 1,000. This means our selectivity is: 10/1000 -> .01. .01 selectivity means we have: `scan_est = 10/.01 = 1,000` -> we will potentially have to scan 1,000 parent rows before finding 10 matches from the child. ```ts let scanEst = parentCost.baseCardinality; if (this.#type === 'semi' && parentCost.limit !== undefined) { if (childCost.selectivity !== 0) { scanEst = Math.min(scanEst, parentCost.limit / childCost.selectivity); } } ``` # Problem The simple algorithm isn't quite complete. Think of this query: ```ts user.whereExists('comments', q => q.whereCreated(gt(date))) ``` The selectivity on `comments` may be high but if users have created many comments, this increases the likelihood of finding a match. **Example** Say we have users `1,2,3,4,5` And a comment table. Below are user ids from the comment table, assuming every user created 2 comments. ``` [4, 1, 3, 5, 2, 1, 5, 4, 3, 2] ``` If our filters decimated the table to only half the rows: ``` [1, 5, 4, 3, 2] ``` We have 0.5 selectivity so an estimated scan of `1/0.5 = 2` over `user` to find a match. But we see that all user ids are present in the set! We only scan 1 user row to find a matching comment. This faster matching is because we have a higher fanout from user->comment than 1. So we need to do: `selectivity = 1 - Math.pow(1 - filterSelectivity, fanout) = 0.75 -> scan_est = 1 / 0.75 = 1.33` 1.33 being closer to 1. More on this is documented in packages/zql/src/planner/SELECTIVITY_PLAN.md **Future notes to self:** - [ ] should we throw out filters that are not indexed by sqlite when computing selectivity? - [ ] we currently penalize the plan if it creates a temp b-tree. This penalty is not applied in our `parentCost.limit / childCost.selectivity` algorithm. Postgres has a separate cost which is "startup_cost" that captures things like creating temp indices. - [x] consider how join keys impact selectivity. Likely need to recompute selectivity after constraint prop.
1 parent 142fe5f commit 341dd8f

17 files changed

+853
-14
lines changed

packages/zql-benchmarks/src/planner-hydration.bench.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ function benchmarkQuery<TTable extends keyof typeof schema.tables & string>(
7575

7676
// Map to server names, plan, then map back to client names
7777
const mappedAST = mapAST(unplannedAST, clientToServerMapper);
78+
7879
const plannedServerAST = planQuery(mappedAST, costModel);
7980
const plannedClientAST = mapAST(plannedServerAST, serverToClientMapper);
8081

packages/zql-integration-tests/src/chinook/planner.pg.test.ts

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,9 @@ describe('Chinook planner tests', () => {
7171

7272
test('playlist with track', () => {
7373
const ast = getPlanAST(queries.sqlite.playlist.whereExists('tracks'));
74-
expect(pick(ast, ['where', 'flip'])).toBe(true);
74+
// No flip because:
75+
// all playlists have a track so we must scan all playlists!
76+
expect(pick(ast, ['where', 'flip'])).toBe(false);
7577
expect(pick(ast, ['where', 'related', 'subquery', 'where', 'flip'])).toBe(
7678
false,
7779
);
@@ -80,8 +82,14 @@ describe('Chinook planner tests', () => {
8082
test('tracks with playlist', () => {
8183
const ast = getPlanAST(queries.sqlite.track.whereExists('playlists'));
8284

83-
// playlist table is smaller. Should be flipped.
84-
expect(pick(ast, ['where', 'flip'])).toBe(true);
85+
// TODO:
86+
// This is where SELECTIVITY_PLAN.md would help.
87+
// We assume a selectivity of 1 on playlist_track and playlist
88+
// because there are no filters.
89+
// but a track may not be part of a playlist!
90+
// If many tracks are not part of any playlist
91+
// then flipping would be better.
92+
expect(pick(ast, ['where', 'flip'])).toBe(false);
8593
});
8694

8795
test('has album a or album b', () => {
Lines changed: 336 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,336 @@
1+
# Semi-Join Selectivity Forward Plan
2+
3+
## Problem Statement
4+
5+
The current planner implementation conflates two fundamentally different concepts when estimating costs for semi-joins (EXISTS clauses with LIMIT):
6+
7+
1. **Filter Selectivity**: The fraction of child rows that pass filter predicates
8+
2. **Semi-Join Selectivity**: The fraction of parent rows that have at least one matching child
9+
10+
This conflation leads to incorrect cost estimates for queries with LIMIT clauses on the outer table.
11+
12+
## Background
13+
14+
### Example Scenario
15+
16+
Consider the query:
17+
18+
```sql
19+
SELECT * FROM users
20+
WHERE EXISTS (
21+
SELECT 1 FROM posts
22+
WHERE posts.userId = users.id
23+
AND posts.published = true
24+
)
25+
LIMIT 10
26+
```
27+
28+
The planner needs to estimate: **How many users must we scan to find 10 users that have at least one published post?**
29+
30+
### Current Implementation (Incorrect)
31+
32+
```typescript
33+
// In planner-connection.ts:129-139
34+
const costWithFilters = model(table, sort, filters, undefined);
35+
const costWithoutFilters = model(table, sort, undefined, undefined);
36+
this.selectivity = costWithFilters / costWithoutFilters;
37+
38+
// In planner-join.ts:210-214
39+
scanEst = Math.min(scanEst, parentCost.limit / childCost.selectivity);
40+
```
41+
42+
**What it calculates**: If 50% of posts are published → selectivity = 0.5
43+
**What it estimates**: Need to scan 10 / 0.5 = 20 users
44+
45+
**Problem**: This assumes 50% of users have published posts, but the actual probability depends on how posts are distributed across users (the "fan-out").
46+
47+
### The Two Selectivities Compared
48+
49+
| Scenario | Filter Selectivity | Semi-Join Selectivity | Reality |
50+
| --------------------------------------------------------------------------------------------- | ---------------------------------- | ------------------------------------------------- | --------------------------------------- |
51+
| Even distribution<br/>1000 users, 10 posts each<br/>50% published | 0.5<br/>(50% of posts pass filter) | 0.999<br/>(P(user has ≥1 published) = 1 - 0.5^10) | Almost all users have ≥1 published post |
52+
| Skewed distribution<br/>900 users with 0 posts<br/>100 users with 100 posts<br/>50% published | 0.5<br/>(still 50% of posts pass) | 0.1<br/>(only 100/1000 users have any posts) | Only 10% of users have posts at all |
53+
54+
**Key insight**: Filter selectivity tells us about child rows; semi-join selectivity tells us about parent rows. They're only equal when fan-out = 1 (foreign key from parent to child).
55+
56+
## The Fan-Out Formula
57+
58+
The relationship between filter and semi-join selectivity is:
59+
60+
```
61+
semiJoinSelectivity = 1 - (1 - filterSelectivity)^fanOut
62+
```
63+
64+
Where **fan-out** = average number of child rows per parent row.
65+
66+
### Proof Intuition
67+
68+
For a parent with N children, each with independent probability P of matching:
69+
70+
- Probability NO children match: (1 - P)^N
71+
- Probability ≥1 child matches: 1 - (1 - P)^N
72+
73+
### Edge Cases
74+
75+
**When fan-out = 1** (foreign key from parent → child):
76+
77+
```
78+
semiJoinSelectivity = 1 - (1 - filterSelectivity)^1
79+
= filterSelectivity
80+
```
81+
82+
Current implementation is correct! ✅
83+
84+
**When fan-out = 10** (typical one-to-many):
85+
86+
- filterSelectivity = 0.5 → semiJoinSelectivity = 0.999
87+
- filterSelectivity = 0.1 → semiJoinSelectivity = 0.651
88+
- filterSelectivity = 0.01 → semiJoinSelectivity = 0.096
89+
90+
**When fan-out = 100** (high cardinality):
91+
92+
- filterSelectivity = 0.5 → semiJoinSelectivity ≈ 1.0
93+
- filterSelectivity = 0.1 → semiJoinSelectivity ≈ 1.0
94+
- filterSelectivity = 0.01 → semiJoinSelectivity = 0.634
95+
96+
## Solution: Query sqlite_stat1 for Fan-Out
97+
98+
### What sqlite_stat1 Provides
99+
100+
After running `ANALYZE`, SQLite populates `sqlite_stat1` with index statistics:
101+
102+
```sql
103+
SELECT tbl, idx, stat FROM sqlite_stat1;
104+
105+
-- Example result:
106+
-- tbl='posts', idx='idx_posts_userId', stat='10000 100'
107+
-- ^ ^
108+
-- | |
109+
-- total rows avg rows per distinct userId
110+
```
111+
112+
The `stat` column format: `"totalRows avgRowsPerDistinct [avgRowsPerDistinct2 ...]"`
113+
114+
For an index on `posts.userId`:
115+
116+
- First number: Total rows in the index (10,000 posts)
117+
- Second number: Average rows per distinct value of userId (100 posts per user)
118+
- **The second number IS the fan-out!**
119+
120+
### Implementation Strategy
121+
122+
```typescript
123+
function getFanOutFromStats(
124+
db: Database,
125+
tableName: string,
126+
columnName: string,
127+
): number | undefined {
128+
// 1. Find indexes containing the column
129+
const indexes = db
130+
.prepare(
131+
`
132+
SELECT name FROM sqlite_master
133+
WHERE type='index'
134+
AND tbl_name=?
135+
AND sql LIKE '%' || ? || '%'
136+
`,
137+
)
138+
.all(tableName, columnName);
139+
140+
if (indexes.length === 0) {
141+
return undefined; // No index, can't get stats
142+
}
143+
144+
// 2. Query sqlite_stat1 for the first matching index
145+
for (const {name: indexName} of indexes) {
146+
const result = db
147+
.prepare(
148+
`
149+
SELECT stat FROM sqlite_stat1
150+
WHERE tbl=? AND idx=?
151+
`,
152+
)
153+
.get(tableName, indexName);
154+
155+
if (result && result.stat) {
156+
const parts = result.stat.split(' ');
157+
if (parts.length >= 2) {
158+
return parseInt(parts[1], 10); // Average rows per distinct value
159+
}
160+
}
161+
}
162+
163+
return undefined; // Stats not available (ANALYZE not run)
164+
}
165+
```
166+
167+
### Fallback Strategy
168+
169+
```typescript
170+
const DEFAULT_FANOUT = 3; // Conservative middle ground
171+
172+
const fanOut = getFanOutFromStats(db, table, joinColumn) ?? DEFAULT_FANOUT;
173+
const semiJoinSelectivity = 1 - Math.pow(1 - filterSelectivity, fanOut);
174+
```
175+
176+
**Why 3?**
177+
178+
- SQLite's default is 10 (might be too optimistic)
179+
- fan-out = 1 is common for FK relationships
180+
- fan-out = 3 is a conservative middle ground
181+
- Overestimating selectivity → slightly less efficient but still correct
182+
- Underestimating selectivity → risk of bad plans
183+
184+
## Design Decisions
185+
186+
### Default Fan-Out Value
187+
188+
**Options**:
189+
190+
- 1: Conservative, assumes FK relationships (current behavior)
191+
- 3: Moderate, safe middle ground
192+
- 10: SQLite's default, optimistic
193+
194+
**Recommendation**: Use 3
195+
196+
- Safer than 10 for sparse relationships
197+
- More accurate than 1 for typical one-to-many
198+
- Easy to tune based on production data
199+
200+
## Edge Cases and Considerations
201+
202+
### 1. No Index on Join Column
203+
204+
If the child table has no index on the join column:
205+
206+
- `getFanOutFromStats()` returns `undefined`
207+
- Fall back to `DEFAULT_FANOUT`
208+
- Cost estimates may be less accurate but still safe
209+
210+
### 2. ANALYZE Not Run
211+
212+
If `sqlite_stat1` is empty:
213+
214+
- `getFanOutFromStats()` returns `undefined`
215+
- Fall back to `DEFAULT_FANOUT`
216+
- Document recommendation to run ANALYZE periodically
217+
218+
### 3. Composite Indexes
219+
220+
If join column is part of a multi-column index:
221+
222+
- `sqlite_stat1` provides stats at each position
223+
- Use the stat corresponding to the join column's position
224+
- May require more sophisticated parsing
225+
226+
### 4. Multiple Indexes on Same Column
227+
228+
If multiple indexes contain the column:
229+
230+
- Use first index found (arbitrary but consistent)
231+
- Could enhance to prefer indexes where column is leftmost
232+
233+
### 5. Zero Selectivity
234+
235+
Current code has a bug where `semiJoinSelectivity = 0` (EXISTS can never succeed):
236+
237+
```typescript
238+
if (childCost.semiJoinSelectivity === 0) {
239+
// EXISTS can never succeed - result set is empty
240+
return {
241+
baseCardinality: 0,
242+
runningCost: parentCost.runningCost, // Cost to discover emptiness
243+
filterSelectivity: 0,
244+
semiJoinSelectivity: 0,
245+
limit: parentCost.limit,
246+
};
247+
}
248+
```
249+
250+
This should be fixed in the same PR.
251+
252+
### 6. NOT EXISTS
253+
254+
For `NOT EXISTS`, we don't short-circuit on first match:
255+
256+
- Must scan ALL children to verify none match
257+
- Don't apply limit or use semi-join selectivity
258+
- Current code correctly doesn't set limit for NOT EXISTS
259+
260+
## Future Enhancements
261+
262+
### Enhancement 1: Use sqlite_stat4 for Distribution
263+
264+
**Problem**: `sqlite_stat1` provides average fan-out, but distributions can be highly skewed.
265+
266+
**Solution**: Sample actual distribution from `sqlite_stat4`:
267+
268+
```typescript
269+
function getSemiJoinSelectivityWithStat4(
270+
db: Database,
271+
childTable: string,
272+
childColumn: string,
273+
filterSelectivity: number,
274+
): number {
275+
// Query sqlite_stat4 for distribution samples
276+
const samples = db
277+
.prepare(
278+
`
279+
SELECT CAST(neq AS INTEGER) as fanOut
280+
FROM sqlite_stat4
281+
WHERE tbl=? AND idx LIKE '%' || ? || '%'
282+
`,
283+
)
284+
.all(childTable, childColumn);
285+
286+
if (samples.length === 0) {
287+
return fallbackToStat1();
288+
}
289+
290+
// Calculate selectivity for each sample's fan-out
291+
const selectivities = samples.map(
292+
s => 1 - Math.pow(1 - filterSelectivity, s.fanOut),
293+
);
294+
295+
// Average across the distribution
296+
return selectivities.reduce((a, b) => a + b, 0) / selectivities.length;
297+
}
298+
```
299+
300+
**Challenge**: `stat4` only samples rows that exist in the index. Parents with 0 children don't appear. Need to adjust for missing parents.
301+
302+
## References
303+
304+
### PostgreSQL Documentation
305+
306+
- [Semi-Join Planning](https://postgrespro.com/blog/pgsql/5969618)
307+
- [compute_semi_anti_join_factors](https://doxygen.postgresql.org/costsize_8c.html)
308+
- Calculates `outer_match_frac`: fraction of outer rows that have matches
309+
- Uses NDV (number of distinct values) from statistics
310+
311+
### SQLite Documentation
312+
313+
- [sqlite_stat1 Format](https://sqlite.org/fileformat2.html#stat1tab)
314+
- [Query Planning Overview](https://sqlite.org/optoverview.html)
315+
- [ANALYZE Command](https://sqlite.org/lang_analyze.html)
316+
317+
### Academic References
318+
319+
- "Access Path Selection in a Relational Database Management System" (Selinger et al., 1979)
320+
- Original paper on cost-based optimization
321+
- Discusses selectivity estimation for joins
322+
323+
## Open Questions
324+
325+
1. **Should we fix zero selectivity edge case in same PR?**
326+
- Pro: Related to selectivity handling
327+
- Con: Separate concern, could be separate PR
328+
329+
2. **What's the right default fan-out?**
330+
- 1: Current behavior (conservative for FK, wrong for one-to-many)
331+
- 3: Moderate (recommended)
332+
- 10: SQLite's default (optimistic)
333+
334+
3. **Should we add warnings/logging when falling back to default?**
335+
- Could help users identify missing ANALYZE or indexes
336+
- Might be noisy for intentionally simple cost models

0 commit comments

Comments
 (0)