Skip to content

Conversation

@karthikps97
Copy link
Member

@karthikps97 karthikps97 commented Nov 20, 2025

Describe the Problem

When dedup checks for existing chunks in the system, the query filters by system, dedup_key and bucket. When the table size is large, the index is not being used by the pg planner.

Explain the Changes

  1. Added an upgrade script which deletes the index 'idx_btree_datachunks_dedup_key'.
  2. New composite index (system, bucket, dedup_key) is created during bootstrapping.

Issues: Fixed #xxx / Gap #xxx

Testing Instructions:

  1. Upgrade noobaa from 4.20 to 4.21.
  2. After successful upgrade compare the time taken by upload with dedup enabled.
  • Doc added/updated
  • Tests added

Summary by CodeRabbit

  • Chores
    • Updated a database index to be a composite index including additional fields for more targeted query behavior.
    • Added an upgrade script to remove a deprecated index during the v5.21.0 migration to ensure smoother upgrades.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 20, 2025

Walkthrough

The pull request expands a composite index in src/server/object_services/schemas/data_chunk_indexes.js to include system and bucket alongside dedup_key, and adds an upgrade script src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js that drops the old idx_btree_datachunks_dedup_key index during migration.

Changes

Cohort / File(s) Summary
Index Schema Update
src/server/object_services/schemas/data_chunk_indexes.js
Modified the second index definition: changed the indexed fields from { dedup_key: 1 } to { dedup_key: 1, system: 1, bucket: 1 }. Index name and uniqueness remain unchanged; partialFilterExpression unchanged.
Migration Script
src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js
Added upgrade script exporting run and description; run obtains DB pool and executes DROP INDEX IF EXISTS idx_btree_datachunks_dedup_key, with logging and error handling.

Sequence Diagram(s)

sequenceDiagram
    rect rgb(250,250,255)
    participant Upgrader
    participant UpgradeScript
    participant DB as Database
    note right of UpgradeScript `#eeeeff`: Drops old btree index before bootstrap recreates composite index
    Upgrader->>UpgradeScript: run({ dbg, db_client })
    UpgradeScript->>DB: execute "DROP INDEX IF EXISTS idx_btree_datachunks_dedup_key"
    alt success
        DB-->>UpgradeScript: OK
        UpgradeScript-->>Upgrader: log success
    else error
        DB-->>UpgradeScript: ERROR
        UpgradeScript-->>Upgrader: dbg.error(...) and rethrow
    end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Review points:
    • Confirm composite index field order aligns with expected query patterns.
    • Ensure migration runs in the correct upgrade phase and has necessary DB privileges.
    • Verify logging and error handling in remove_datachunks_index.js.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main change: introducing a composite index for datachunks to improve performance, which is the core objective of the pull request.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 357104e and e937e8b.

📒 Files selected for processing (2)
  • src/server/object_services/schemas/data_chunk_indexes.js (1 hunks)
  • src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

Applied to files:

  • src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js
📚 Learning: 2025-08-11T06:12:12.318Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.

Applied to files:

  • src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js
📚 Learning: 2025-08-26T06:29:52.095Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9195
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:25-31
Timestamp: 2025-08-26T06:29:52.095Z
Learning: In noobaa-core upgrade scripts, there is only one internal/mongo pool per system, so removing mongo_pools[0] is sufficient and correct rather than handling multiple pools.

Applied to files:

  • src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (1)
src/server/object_services/schemas/data_chunk_indexes.js (1)

18-22: I encountered persistent repository access issues that prevent me from executing shell scripts to verify the query patterns. However, based on the information provided in the review comment itself, I can draw some important observations:

Key observation: The test index name referenced in the PR comments is idx_btree_datachunks_dedup_key_system_bucket, which matches the code order (dedup_key, system, bucket), not the PR description order (system, bucket, dedup_key). The substantial performance improvement (from ~37.383 ms to ~0.085 ms) suggests the current field order is effective.

However, without access to the actual query patterns in the codebase, I cannot definitively verify whether the current order is optimal or if reordering would provide better selectivity. Here is the rewritten review comment:


Verify that the index field order aligns with the typical query predicates.

The PR description states the composite index is on (system, bucket, dedup_key), but the code defines { dedup_key: 1, system: 1, bucket: 1 }. The test results show the index performs well (37.383 ms → 0.085 ms improvement), and the test index name idx_btree_datachunks_dedup_key_system_bucket reflects the code's field order, suggesting intentional placement. However, confirm that this field order matches typical dedup check query patterns. If queries predominantly filter by system first, consider whether { system: 1, bucket: 1, dedup_key: 1 } would improve index selectivity, or document why the current order is optimal.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@karthikps97 karthikps97 linked an issue Nov 20, 2025 that may be closed by this pull request
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js (1)

4-15: Upgrade script behavior looks good; consider fixing small log typo

The script correctly drops the old index in an idempotent way and surfaces errors via dbg.error and rethrow, which is appropriate for an upgrade step. One small polish item: the error message has a typo ("ocurred""occurred"), which you may want to fix for clearer logs.

A minimal diff for the message:

-    dbg.error('An error ocurred in the upgrade process:', err);
+    dbg.error('An error occurred in the upgrade process:', err);
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 418bb3b and 3d9f7ac.

📒 Files selected for processing (2)
  • src/server/object_services/schemas/data_chunk_indexes.js (1 hunks)
  • src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

Applied to files:

  • src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js
📚 Learning: 2025-08-11T06:12:12.318Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.

Applied to files:

  • src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js
📚 Learning: 2025-08-26T06:29:52.095Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9195
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:25-31
Timestamp: 2025-08-26T06:29:52.095Z
Learning: In noobaa-core upgrade scripts, there is only one internal/mongo pool per system, so removing mongo_pools[0] is sufficient and correct rather than handling multiple pools.

Applied to files:

  • src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: run-jest-unit-tests
  • GitHub Check: run-package-lock-validation
  • GitHub Check: Build Noobaa Image
🔇 Additional comments (1)
src/server/object_services/schemas/data_chunk_indexes.js (1)

18-22: Verification confirms the original concern: dedup_key-only queries will lose index optimization

The ripgrep output reveals critical queries in get_aprox_dedup_keys_number() (md_store.js lines 1734–1761) that filter and sort only on dedup_key, without system or bucket:

  • Line 1740: estimatedQueryCount({ dedup_key: { $exists: true } })
  • Line 1747: Range query dedup_key: { $lt: marker, $exists: true }
  • Lines 1751, 1754: Sorting by dedup_key: 1 and dedup_key: -1

Since the new index has leading columns (system, bucket), these dedup_key-only queries will not benefit from it. The old single-column index was optimal for this access pattern. This function is called by the dedup_indexer background service (line 39 in dedup_indexer.js), so the performance impact is in production code.

Before rollout, verify either: (1) that these dedup_key-only queries are not performance-sensitive, or (2) that a separate single-column dedup_key index will be retained or created to maintain performance.

@coderabbitai
Copy link

coderabbitai bot commented Nov 20, 2025

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #9296

coderabbitai bot added a commit that referenced this pull request Nov 20, 2025
Docstrings generation was requested by @karthikps97.

* #9293 (comment)

The following files were modified:

* `src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js`
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d9f7ac and 39465ee.

📒 Files selected for processing (1)
  • src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

Applied to files:

  • src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js
📚 Learning: 2025-08-11T06:12:12.318Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.

Applied to files:

  • src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js
📚 Learning: 2025-08-26T06:29:52.095Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9195
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:25-31
Timestamp: 2025-08-26T06:29:52.095Z
Learning: In noobaa-core upgrade scripts, there is only one internal/mongo pool per system, so removing mongo_pools[0] is sufficient and correct rather than handling multiple pools.

Applied to files:

  • src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests

Comment on lines +9 to +17
try {
const pool = db_client.instance().get_pool();
await pool.query(`DROP INDEX IF EXISTS ${indexName};`);

dbg.log0("Executed upgrade script for dropping index ", indexName);
} catch (err) {
dbg.error('An error ocurred in the upgrade process:', err);
throw err;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tighten error logging message (typo and missing context).

Functionally this looks good and idempotent; only nit is the error log text. There’s a typo and the message doesn’t mention which index failed, which can hinder debugging.

-  } catch (err) {
-    dbg.error('An error ocurred in the upgrade process:', err);
-    throw err;
-  }
+  } catch (err) {
+    dbg.error('An error occurred while dropping index', indexName, err);
+    throw err;
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try {
const pool = db_client.instance().get_pool();
await pool.query(`DROP INDEX IF EXISTS ${indexName};`);
dbg.log0("Executed upgrade script for dropping index ", indexName);
} catch (err) {
dbg.error('An error ocurred in the upgrade process:', err);
throw err;
}
try {
const pool = db_client.instance().get_pool();
await pool.query(`DROP INDEX IF EXISTS ${indexName};`);
dbg.log0("Executed upgrade script for dropping index ", indexName);
} catch (err) {
dbg.error('An error occurred while dropping index', indexName, err);
throw err;
}
🤖 Prompt for AI Agents
In src/upgrade/upgrade_scripts/5.21.0/remove_datachunks_index.js around lines 9
to 17, the catch block's log message has a typo ("ocurred") and lacks context
about which index failed; update the dbg.error call to use the correct spelling
("occurred") and include the indexName and the error object in the message so
the log clearly shows which index failed and why (e.g., construct a single error
string or pass indexName and err to dbg.error).

@liranmauda
Copy link
Contributor

We need to test the upgrade from 4.20 to 4.21 with a lot of records

@karthikps97
Copy link
Member Author

Tested with 1Million + records in datachunks. These are the results:

Query: explain analyze SELECT * FROM datachunks WHERE (data->>'system'='69252cdff617af002209ecd5' and data->>'bucket'='69252cdff617af002209ecdd' and (data->>'dedup_key' IN ('WmlmtwV50YV//LfuCIm+EIlmbLdhUfCs0WJj+yp0l+0=','3OfkqbNFPXrU4odaUHK6A+mu7CwSVmnc09nVdLE0M2A=','WxcA5eoNJEYMic3n/qlU/og4PTP1bYXSDtA4E3m5YJc=','d4k9vAgAlFwviIWz/wFY0q3sECypiiCi6pF6T0fxhD0=') and data ? 'dedup_key') and (data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb)) ORDER BY data->>'_id' DESC;

Before (With only dedup_key index):

_Sort (cost=184.34..184.34 rows=1 width=1418) (actual time=37.361..37.362 rows=4 loops=1)
Sort Key: ((data ->> 'id'::text)) DESC
Sort Method: quicksort Memory: 30kB
-> Bitmap Heap Scan on datachunks (cost=182.05..184.33 rows=1 width=1418) (actual time=37.352..37.355 rows=4 lo
ops=1)
Recheck Cond: (((data ->> 'dedup_key'::text) = ANY ('{WmlmtwV50YV//LfuCIm+EIlmbLdhUfCs0WJj+yp0l+0=,3OfkqbNF
PXrU4odaUHK6A+mu7CwSVmnc09nVdLE0M2A=,WxcA5eoNJEYMic3n/qlU/og4PTP1bYXSDtA4E3m5YJc=,d4k9vAgAlFwviIWz/wFY0q3sECypiiCi6p
F6T0fxhD0=}'::text[])) AND (data ? 'dedup_key'::text) AND (((data -> 'deleted'::text) IS NULL) OR ((data -> 'deleted
'::text) = 'null'::jsonb)))
Filter: (((data ->> 'system'::text) = '69252cdff617af002209ecd5'::text) AND ((data ->> 'bucket'::text) = '6
9252cdff617af002209ecdd'::text))
Heap Blocks: exact=2
-> BitmapAnd (cost=182.05..182.05 rows=2 width=0) (actual time=37.342..37.343 rows=0 loops=1)
-> Bitmap Index Scan on idx_btree_datachunks_dedup_key (cost=0.00..7.66 rows=207 width=0) (actual t
ime=0.027..0.027 rows=4 loops=1)
Index Cond: ((data ->> 'dedup_key'::text) = ANY ('{WmlmtwV50YV//LfuCIm+EIlmbLdhUfCs0WJj+yp0l+0=
,3OfkqbNFPXrU4odaUHK6A+mu7CwSVmnc09nVdLE0M2A=,WxcA5eoNJEYMic3n/qlU/og4PTP1bYXSDtA4E3m5YJc=,d4k9vAgAlFwviIWz/wFY0q3sE
CypiiCi6pF6T0fxhD0=}'::text[]))
-> Bitmap Index Scan on idx_btree_datachunks_id_desc (cost=0.00..174.14 rows=10323 width=0) (actual
time=37.312..37.312 rows=1209940 loops=1)
Planning Time: 0.071 ms
Execution Time: 37.383 ms

After (dedup_key, system, bucket):

_Sort (cost=7.78..7.79 rows=1 width=1443) (actual time=0.067..0.067 rows=4 loops=1)
Sort Key: ((data ->> 'id'::text)) DESC
Sort Method: quicksort Memory: 30kB
-> Index Scan using idx_btree_datachunks_dedup_key_system_bucket on datachunks (cost=0.55..7.77 rows=1 width=1443) (actual ti
me=0.023..0.053 rows=4 loops=1)
Index Cond: (((data ->> 'dedup_key'::text) = ANY ('{WmlmtwV50YV//LfuCIm+EIlmbLdhUfCs0WJj+yp0l+0=,3OfkqbNFPXrU4odaUHK6A+mu
7CwSVmnc09nVdLE0M2A=,WxcA5eoNJEYMic3n/qlU/og4PTP1bYXSDtA4E3m5YJc=,d4k9vAgAlFwviIWz/wFY0q3sECypiiCi6pF6T0fxhD0=}'::text[])) AND ((d
ata ->> 'system'::text) = '69252cdff617af002209ecd5'::text) AND ((data ->> 'bucket'::text) = '69252cdff617af002209ecdd'::text))
Filter: (((data -> 'deleted'::text) IS NULL) OR ((data -> 'deleted'::text) = 'null'::jsonb))
Planning Time: 0.137 ms
Execution Time: 0.085 ms
(8 rows)

Copy link
Member

@dannyzaken dannyzaken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karthikps97, this looks good by itself.
Can you please go over the code in src/server/bg_services/dedup_indexer.js? This is a background worker intended to control the size of the dedup index by removing the dedup_key from chunks when they reach a specific size. This is old code from when we still used MongoDB, and it hasn't been touched much since. Try looking into the queries it does and make sure your changes did not affect it for the worse.
Unrelated to that, we should probably reconsider its purpose.

@karthikps97
Copy link
Member Author

I went through the module. But since we are using postgres rather than MongoDB, the stats method in postgres_client is still a stub. So there is no functionality that if affected. And yes, the indexer module is outdated since we are no longer using MongoDB. But I still can update the index name there. What would you suggest @dannyzaken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MDStore.find_chunks_by_dedup_key is not hitting indexes

3 participants