-
Notifications
You must be signed in to change notification settings - Fork 90
improve performance by introducing composite index for datachunks #9293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughThe pull request expands a composite index in Changes
Sequence Diagram(s)sequenceDiagram
rect rgb(250,250,255)
participant Upgrader
participant UpgradeScript
participant DB as Database
note right of UpgradeScript `#eeeeff`: Drops old btree index before bootstrap recreates composite index
Upgrader->>UpgradeScript: run({ dbg, db_client })
UpgradeScript->>DB: execute "DROP INDEX IF EXISTS idx_btree_datachunks_dedup_key"
alt success
DB-->>UpgradeScript: OK
UpgradeScript-->>Upgrader: log success
else error
DB-->>UpgradeScript: ERROR
UpgradeScript-->>Upgrader: dbg.error(...) and rethrow
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used🧠 Learnings (4)📓 Common learnings📚 Learning: 2025-08-08T13:12:46.728ZApplied to files:
📚 Learning: 2025-08-11T06:12:12.318ZApplied to files:
📚 Learning: 2025-08-26T06:29:52.095ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (1)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js (1)
4-15: Upgrade script behavior looks good; consider fixing small log typoThe script correctly drops the old index in an idempotent way and surfaces errors via
dbg.errorand rethrow, which is appropriate for an upgrade step. One small polish item: the error message has a typo ("ocurred"→"occurred"), which you may want to fix for clearer logs.A minimal diff for the message:
- dbg.error('An error ocurred in the upgrade process:', err); + dbg.error('An error occurred in the upgrade process:', err);
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/server/object_services/schemas/data_chunk_indexes.js(1 hunks)src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
Applied to files:
src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js
📚 Learning: 2025-08-11T06:12:12.318Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.
Applied to files:
src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js
📚 Learning: 2025-08-26T06:29:52.095Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9195
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:25-31
Timestamp: 2025-08-26T06:29:52.095Z
Learning: In noobaa-core upgrade scripts, there is only one internal/mongo pool per system, so removing mongo_pools[0] is sufficient and correct rather than handling multiple pools.
Applied to files:
src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run-jest-unit-tests
- GitHub Check: run-package-lock-validation
- GitHub Check: Build Noobaa Image
🔇 Additional comments (1)
src/server/object_services/schemas/data_chunk_indexes.js (1)
18-22: Verification confirms the original concern: dedup_key-only queries will lose index optimizationThe ripgrep output reveals critical queries in
get_aprox_dedup_keys_number()(md_store.js lines 1734–1761) that filter and sort only on dedup_key, without system or bucket:
- Line 1740:
estimatedQueryCount({ dedup_key: { $exists: true } })- Line 1747: Range query
dedup_key: { $lt: marker, $exists: true }- Lines 1751, 1754: Sorting by
dedup_key: 1anddedup_key: -1Since the new index has leading columns
(system, bucket), these dedup_key-only queries will not benefit from it. The old single-column index was optimal for this access pattern. This function is called by the dedup_indexer background service (line 39 in dedup_indexer.js), so the performance impact is in production code.Before rollout, verify either: (1) that these dedup_key-only queries are not performance-sensitive, or (2) that a separate single-column dedup_key index will be retained or created to maintain performance.
|
Note Docstrings generation - SUCCESS |
Docstrings generation was requested by @karthikps97. * #9293 (comment) The following files were modified: * `src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
Applied to files:
src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js
📚 Learning: 2025-08-11T06:12:12.318Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.
Applied to files:
src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js
📚 Learning: 2025-08-26T06:29:52.095Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9195
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:25-31
Timestamp: 2025-08-26T06:29:52.095Z
Learning: In noobaa-core upgrade scripts, there is only one internal/mongo pool per system, so removing mongo_pools[0] is sufficient and correct rather than handling multiple pools.
Applied to files:
src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build Noobaa Image
- GitHub Check: run-package-lock-validation
- GitHub Check: run-jest-unit-tests
| try { | ||
| const pool = db_client.instance().get_pool(); | ||
| await pool.query(`DROP INDEX IF EXISTS ${indexName};`); | ||
|
|
||
| dbg.log0("Executed upgrade script for dropping index ", indexName); | ||
| } catch (err) { | ||
| dbg.error('An error ocurred in the upgrade process:', err); | ||
| throw err; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tighten error logging message (typo and missing context).
Functionally this looks good and idempotent; only nit is the error log text. There’s a typo and the message doesn’t mention which index failed, which can hinder debugging.
- } catch (err) {
- dbg.error('An error ocurred in the upgrade process:', err);
- throw err;
- }
+ } catch (err) {
+ dbg.error('An error occurred while dropping index', indexName, err);
+ throw err;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try { | |
| const pool = db_client.instance().get_pool(); | |
| await pool.query(`DROP INDEX IF EXISTS ${indexName};`); | |
| dbg.log0("Executed upgrade script for dropping index ", indexName); | |
| } catch (err) { | |
| dbg.error('An error ocurred in the upgrade process:', err); | |
| throw err; | |
| } | |
| try { | |
| const pool = db_client.instance().get_pool(); | |
| await pool.query(`DROP INDEX IF EXISTS ${indexName};`); | |
| dbg.log0("Executed upgrade script for dropping index ", indexName); | |
| } catch (err) { | |
| dbg.error('An error occurred while dropping index', indexName, err); | |
| throw err; | |
| } |
🤖 Prompt for AI Agents
In src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js around lines 9
to 17, the catch block's log message has a typo ("ocurred") and lacks context
about which index failed; update the dbg.error call to use the correct spelling
("occurred") and include the indexName and the error object in the message so
the log clearly shows which index failed and why (e.g., construct a single error
string or pass indexName and err to dbg.error).
|
We need to test the upgrade from 4.20 to 4.21 with a lot of records |
|
Tested with 1Million + records in datachunks. These are the results: Query: Before (With only dedup_key index): _Sort (cost=184.34..184.34 rows=1 width=1418) (actual time=37.361..37.362 rows=4 loops=1) After (dedup_key, system, bucket): _Sort (cost=7.78..7.79 rows=1 width=1443) (actual time=0.067..0.067 rows=4 loops=1) |
…ite index Signed-off-by: Karthik P S <[email protected]>
357104e to
e937e8b
Compare
dannyzaken
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karthikps97, this looks good by itself.
Can you please go over the code in src/server/bg_services/dedup_indexer.js? This is a background worker intended to control the size of the dedup index by removing the dedup_key from chunks when they reach a specific size. This is old code from when we still used MongoDB, and it hasn't been touched much since. Try looking into the queries it does and make sure your changes did not affect it for the worse.
Unrelated to that, we should probably reconsider its purpose.
|
I went through the module. But since we are using postgres rather than MongoDB, the |
Describe the Problem
When dedup checks for existing chunks in the system, the query filters by system, dedup_key and bucket. When the table size is large, the index is not being used by the pg planner.
Explain the Changes
Issues: Fixed #xxx / Gap #xxx
Testing Instructions:
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.