improve performance by introducing composite index for datachunks #9293

karthikps97 · 2025-11-20T03:20:48Z

Describe the Problem

When dedup checks for existing chunks in the system, the query filters by system, dedup_key and bucket. When the table size is large, the index is not being used by the pg planner.

Explain the Changes

Added an upgrade script which deletes the index 'idx_btree_datachunks_dedup_key'.
New composite index (system, bucket, dedup_key) is created during bootstrapping.

Issues: Fixed #xxx / Gap #xxx

Testing Instructions:

Upgrade noobaa from 4.20 to 4.21.
After successful upgrade compare the time taken by upload with dedup enabled.

Doc added/updated
Tests added

Summary by CodeRabbit

Chores
- Updated a database index to be a composite index including additional fields for more targeted query behavior.
- Added an upgrade script to remove a deprecated index during the v5.21.0 migration to ensure smoother upgrades.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-20T03:21:00Z

Walkthrough

The pull request expands a composite index in src/server/object_services/schemas/data_chunk_indexes.js to include system and bucket alongside dedup_key, and adds an upgrade script src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js that drops the old idx_btree_datachunks_dedup_key index during migration.

Changes

Cohort / File(s)	Summary
Index Schema Update `src/server/object_services/schemas/data_chunk_indexes.js`	Modified the second index definition: changed the indexed fields from `{ dedup_key: 1 }` to `{ dedup_key: 1, system: 1, bucket: 1 }`. Index name and uniqueness remain unchanged; partialFilterExpression unchanged.
Migration Script `src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js`	Added upgrade script exporting `run` and `description`; `run` obtains DB pool and executes `DROP INDEX IF EXISTS idx_btree_datachunks_dedup_key`, with logging and error handling.

Sequence Diagram(s)

sequenceDiagram
    rect rgb(250,250,255)
    participant Upgrader
    participant UpgradeScript
    participant DB as Database
    note right of UpgradeScript `#eeeeff`: Drops old btree index before bootstrap recreates composite index
    Upgrader->>UpgradeScript: run({ dbg, db_client })
    UpgradeScript->>DB: execute "DROP INDEX IF EXISTS idx_btree_datachunks_dedup_key"
    alt success
        DB-->>UpgradeScript: OK
        UpgradeScript-->>Upgrader: log success
    else error
        DB-->>UpgradeScript: ERROR
        UpgradeScript-->>Upgrader: dbg.error(...) and rethrow
    end
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Review points:
- Confirm composite index field order aligns with expected query patterns.
- Ensure migration runs in the correct upgrade phase and has necessary DB privileges.
- Verify logging and error handling in remove_datachunks_index.js.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main change: introducing a composite index for datachunks to improve performance, which is the core objective of the pull request.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 357104e and e937e8b.

📒 Files selected for processing (2)

src/server/object_services/schemas/data_chunk_indexes.js (1 hunks)
src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js (1 hunks)

🧰 Additional context used

🧠 Learnings (4)

📓 Common learnings

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

📚 Learning: 2025-08-08T13:12:46.728Z

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

Applied to files:

src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js

📚 Learning: 2025-08-11T06:12:12.318Z

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.

Applied to files:

src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js

📚 Learning: 2025-08-26T06:29:52.095Z

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9195
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:25-31
Timestamp: 2025-08-26T06:29:52.095Z
Learning: In noobaa-core upgrade scripts, there is only one internal/mongo pool per system, so removing mongo_pools[0] is sufficient and correct rather than handling multiple pools.

Applied to files:

src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Build Noobaa Image
GitHub Check: run-package-lock-validation
GitHub Check: run-jest-unit-tests

🔇 Additional comments (1)

src/server/object_services/schemas/data_chunk_indexes.js (1)

18-22: I encountered persistent repository access issues that prevent me from executing shell scripts to verify the query patterns. However, based on the information provided in the review comment itself, I can draw some important observations:

Key observation: The test index name referenced in the PR comments is idx_btree_datachunks_dedup_key_system_bucket, which matches the code order (dedup_key, system, bucket), not the PR description order (system, bucket, dedup_key). The substantial performance improvement (from ~37.383 ms to ~0.085 ms) suggests the current field order is effective.

However, without access to the actual query patterns in the codebase, I cannot definitively verify whether the current order is optimal or if reordering would provide better selectivity. Here is the rewritten review comment:

Verify that the index field order aligns with the typical query predicates.

The PR description states the composite index is on (system, bucket, dedup_key), but the code defines { dedup_key: 1, system: 1, bucket: 1 }. The test results show the index performs well (37.383 ms → 0.085 ms improvement), and the test index name idx_btree_datachunks_dedup_key_system_bucket reflects the code's field order, suggesting intentional placement. However, confirm that this field order matches typical dedup check query patterns. If queries predominantly filter by system first, consider whether { system: 1, bucket: 1, dedup_key: 1 } would improve index selectivity, or document why the current order is optimal.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js (1)
4-15: Upgrade script behavior looks good; consider fixing small log typo

The script correctly drops the old index in an idempotent way and surfaces errors via dbg.error and rethrow, which is appropriate for an upgrade step. One small polish item: the error message has a typo ("ocurred" → "occurred"), which you may want to fix for clearer logs.

A minimal diff for the message:
-    dbg.error('An error ocurred in the upgrade process:', err);
+    dbg.error('An error occurred in the upgrade process:', err);

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 418bb3b and 3d9f7ac.

📒 Files selected for processing (2)

src/server/object_services/schemas/data_chunk_indexes.js (1 hunks)
src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js (1 hunks)

🧰 Additional context used

🧠 Learnings (4)

📓 Common learnings

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.

📚 Learning: 2025-08-08T13:12:46.728Z

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

Applied to files:

src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js

📚 Learning: 2025-08-11T06:12:12.318Z

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.

Applied to files:

src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js

📚 Learning: 2025-08-26T06:29:52.095Z

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9195
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:25-31
Timestamp: 2025-08-26T06:29:52.095Z
Learning: In noobaa-core upgrade scripts, there is only one internal/mongo pool per system, so removing mongo_pools[0] is sufficient and correct rather than handling multiple pools.

Applied to files:

src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: run-jest-unit-tests
GitHub Check: run-package-lock-validation
GitHub Check: Build Noobaa Image

🔇 Additional comments (1)

src/server/object_services/schemas/data_chunk_indexes.js (1)

18-22: Verification confirms the original concern: dedup_key-only queries will lose index optimization

The ripgrep output reveals critical queries in get_aprox_dedup_keys_number() (md_store.js lines 1734–1761) that filter and sort only on dedup_key, without system or bucket:

Line 1740: estimatedQueryCount({ dedup_key: { $exists: true } })

Line 1747: Range query dedup_key: { $lt: marker, $exists: true }

Lines 1751, 1754: Sorting by dedup_key: 1 and dedup_key: -1

Since the new index has leading columns (system, bucket), these dedup_key-only queries will not benefit from it. The old single-column index was optimal for this access pattern. This function is called by the dedup_indexer background service (line 39 in dedup_indexer.js), so the performance impact is in production code.

Before rollout, verify either: (1) that these dedup_key-only queries are not performance-sensitive, or (2) that a separate single-column dedup_key index will be retained or created to maintain performance.

src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js

coderabbitai · 2025-11-20T09:32:13Z

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #9296

@karthikps97

Docstrings generation was requested by @karthikps97. * #9293 (comment) The following files were modified: * `src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js`

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d9f7ac and 39465ee.

📒 Files selected for processing (1)

src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js (1 hunks)

🧰 Additional context used

🧠 Learnings (4)

📓 Common learnings

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.

📚 Learning: 2025-08-08T13:12:46.728Z

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

Applied to files:

src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js

📚 Learning: 2025-08-11T06:12:12.318Z

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.

Applied to files:

src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js

📚 Learning: 2025-08-26T06:29:52.095Z

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9195
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:25-31
Timestamp: 2025-08-26T06:29:52.095Z
Learning: In noobaa-core upgrade scripts, there is only one internal/mongo pool per system, so removing mongo_pools[0] is sufficient and correct rather than handling multiple pools.

Applied to files:

src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Build Noobaa Image
GitHub Check: run-package-lock-validation
GitHub Check: run-jest-unit-tests

coderabbitai · 2025-11-20T09:52:28Z

src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js

+  try {
+    const pool = db_client.instance().get_pool();
+    await pool.query(`DROP INDEX IF EXISTS ${indexName};`);
+
+    dbg.log0("Executed upgrade script for dropping index ", indexName);
+  } catch (err) {
+    dbg.error('An error ocurred in the upgrade process:', err);
+    throw err;
+  }


⚠️ Potential issue | 🟡 Minor

Tighten error logging message (typo and missing context).

Functionally this looks good and idempotent; only nit is the error log text. There’s a typo and the message doesn’t mention which index failed, which can hinder debugging.

- } catch (err) { - dbg.error('An error ocurred in the upgrade process:', err); - throw err; - } + } catch (err) { + dbg.error('An error occurred while dropping index', indexName, err); + throw err; + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

try {

const pool = db_client.instance().get_pool();

await pool.query(`DROP INDEX IF EXISTS ${indexName};`);

dbg.log0("Executed upgrade script for dropping index ", indexName);

} catch (err) {

dbg.error('An error ocurred in the upgrade process:', err);

throw err;

}

try {

const pool = db_client.instance().get_pool();

await pool.query(`DROP INDEX IF EXISTS ${indexName};`);

dbg.log0("Executed upgrade script for dropping index ", indexName);

} catch (err) {

dbg.error('An error occurred while dropping index', indexName, err);

throw err;

}

🤖 Prompt for AI Agents

In src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js around lines 9 to 17, the catch block's log message has a typo ("ocurred") and lacks context about which index failed; update the dbg.error call to use the correct spelling ("occurred") and include the indexName and the error object in the message so the log clearly shows which index failed and why (e.g., construct a single error string or pass indexName and err to dbg.error).

liranmauda · 2025-11-20T11:00:29Z

We need to test the upgrade from 4.20 to 4.21 with a lot of records

karthikps97 · 2025-11-25T08:42:36Z

Tested with 1Million + records in datachunks. These are the results:

Query: explain analyze SELECT * FROM datachunks WHERE (data->>'system'='69252cdff617af002209ecd5' and data->>'bucket'='69252cdff617af002209ecdd' and (data->>'dedup_key' IN ('WmlmtwV50YV//LfuCIm+EIlmbLdhUfCs0WJj+yp0l+0=','3OfkqbNFPXrU4odaUHK6A+mu7CwSVmnc09nVdLE0M2A=','WxcA5eoNJEYMic3n/qlU/og4PTP1bYXSDtA4E3m5YJc=','d4k9vAgAlFwviIWz/wFY0q3sECypiiCi6pF6T0fxhD0=') and data ? 'dedup_key') and (data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb)) ORDER BY data->>'_id' DESC;

Before (With only dedup_key index):

_Sort (cost=184.34..184.34 rows=1 width=1418) (actual time=37.361..37.362 rows=4 loops=1)
Sort Key: ((data ->> 'id'::text)) DESC
Sort Method: quicksort Memory: 30kB
-> Bitmap Heap Scan on datachunks (cost=182.05..184.33 rows=1 width=1418) (actual time=37.352..37.355 rows=4 lo
ops=1)
Recheck Cond: (((data ->> 'dedup_key'::text) = ANY ('{WmlmtwV50YV//LfuCIm+EIlmbLdhUfCs0WJj+yp0l+0=,3OfkqbNF
PXrU4odaUHK6A+mu7CwSVmnc09nVdLE0M2A=,WxcA5eoNJEYMic3n/qlU/og4PTP1bYXSDtA4E3m5YJc=,d4k9vAgAlFwviIWz/wFY0q3sECypiiCi6p
F6T0fxhD0=}'::text[])) AND (data ? 'dedup_key'::text) AND (((data -> 'deleted'::text) IS NULL) OR ((data -> 'deleted
'::text) = 'null'::jsonb)))
Filter: (((data ->> 'system'::text) = '69252cdff617af002209ecd5'::text) AND ((data ->> 'bucket'::text) = '6
9252cdff617af002209ecdd'::text))
Heap Blocks: exact=2
-> BitmapAnd (cost=182.05..182.05 rows=2 width=0) (actual time=37.342..37.343 rows=0 loops=1)
-> Bitmap Index Scan on idx_btree_datachunks_dedup_key (cost=0.00..7.66 rows=207 width=0) (actual t
ime=0.027..0.027 rows=4 loops=1)
Index Cond: ((data ->> 'dedup_key'::text) = ANY ('{WmlmtwV50YV//LfuCIm+EIlmbLdhUfCs0WJj+yp0l+0=
,3OfkqbNFPXrU4odaUHK6A+mu7CwSVmnc09nVdLE0M2A=,WxcA5eoNJEYMic3n/qlU/og4PTP1bYXSDtA4E3m5YJc=,d4k9vAgAlFwviIWz/wFY0q3sE
CypiiCi6pF6T0fxhD0=}'::text[]))
-> Bitmap Index Scan on idx_btree_datachunks_id_desc (cost=0.00..174.14 rows=10323 width=0) (actual
time=37.312..37.312 rows=1209940 loops=1)
Planning Time: 0.071 ms
Execution Time: 37.383 ms

After (dedup_key, system, bucket):

_Sort (cost=7.78..7.79 rows=1 width=1443) (actual time=0.067..0.067 rows=4 loops=1)
Sort Key: ((data ->> 'id'::text)) DESC
Sort Method: quicksort Memory: 30kB
-> Index Scan using idx_btree_datachunks_dedup_key_system_bucket on datachunks (cost=0.55..7.77 rows=1 width=1443) (actual ti
me=0.023..0.053 rows=4 loops=1)
Index Cond: (((data ->> 'dedup_key'::text) = ANY ('{WmlmtwV50YV//LfuCIm+EIlmbLdhUfCs0WJj+yp0l+0=,3OfkqbNFPXrU4odaUHK6A+mu
7CwSVmnc09nVdLE0M2A=,WxcA5eoNJEYMic3n/qlU/og4PTP1bYXSDtA4E3m5YJc=,d4k9vAgAlFwviIWz/wFY0q3sECypiiCi6pF6T0fxhD0=}'::text[])) AND ((d
ata ->> 'system'::text) = '69252cdff617af002209ecd5'::text) AND ((data ->> 'bucket'::text) = '69252cdff617af002209ecdd'::text))
Filter: (((data -> 'deleted'::text) IS NULL) OR ((data -> 'deleted'::text) = 'null'::jsonb))
Planning Time: 0.137 ms
Execution Time: 0.085 ms
(8 rows)

…ite index Signed-off-by: Karthik P S <[email protected]>

dannyzaken

@karthikps97, this looks good by itself.
Can you please go over the code in src/server/bg_services/dedup_indexer.js? This is a background worker intended to control the size of the dedup index by removing the dedup_key from chunks when they reach a specific size. This is old code from when we still used MongoDB, and it hasn't been touched much since. Try looking into the queries it does and make sure your changes did not affect it for the worse.
Unrelated to that, we should probably reconsider its purpose.

karthikps97 · 2025-11-26T17:35:57Z

I went through the module. But since we are using postgres rather than MongoDB, the stats method in postgres_client is still a stub. So there is no functionality that if affected. And yes, the indexer module is outdated since we are no longer using MongoDB. But I still can update the index name there. What would you suggest @dannyzaken.

karthikps97 requested review from dannyzaken and jackyalbo November 20, 2025 03:20

karthikps97 self-assigned this Nov 20, 2025

karthikps97 added the size/S label Nov 20, 2025

karthikps97 linked an issue Nov 20, 2025 that may be closed by this pull request

MDStore.find_chunks_by_dedup_key is not hitting indexes #9249

Open

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

liranmauda reviewed Nov 20, 2025

View reviewed changes

src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js Show resolved Hide resolved

src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js Outdated Show resolved Hide resolved

coderabbitai bot mentioned this pull request Nov 20, 2025

📝 Add docstrings to 5.21-changes #9296

Closed

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

improve performance of find_chunks_by_dedup_key by introducing compos…

e937e8b

…ite index Signed-off-by: Karthik P S <[email protected]>

karthikps97 force-pushed the 5.21-changes branch from 357104e to e937e8b Compare November 26, 2025 03:45

dannyzaken reviewed Nov 26, 2025

View reviewed changes

karthikps97 requested a review from dannyzaken November 27, 2025 05:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

improve performance by introducing composite index for datachunks #9293

improve performance by introducing composite index for datachunks #9293

karthikps97 commented Nov 20, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 20, 2025

Uh oh!

liranmauda commented Nov 20, 2025

Uh oh!

karthikps97 commented Nov 25, 2025

Uh oh!

dannyzaken left a comment

Uh oh!

karthikps97 commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

improve performance by introducing composite index for datachunks #9293

Are you sure you want to change the base?

improve performance by introducing composite index for datachunks #9293

Conversation

karthikps97 commented Nov 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the Problem

Explain the Changes

Issues: Fixed #xxx / Gap #xxx

Testing Instructions:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

liranmauda commented Nov 20, 2025

Uh oh!

karthikps97 commented Nov 25, 2025

Uh oh!

dannyzaken left a comment

Choose a reason for hiding this comment

Uh oh!

karthikps97 commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

karthikps97 commented Nov 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 20, 2025 •

edited

Loading

coderabbitai bot commented Nov 20, 2025 •

edited

Loading