Skip to content

MDStore.find_chunks_by_dedup_key is not hitting indexes #9249

@dannyzaken

Description

@dannyzaken

Environment info

  • NooBaa Version: VERSION
  • Platform: Kubernetes 1.14.1 | minikube 1.1.1 | OpenShift 4.1 | other: specify

Actual behavior

  1. during upload when dedup is enabled, MDStore.find_chunks_by_dedup_key query is not hitting the index.
nbcore=# EXPLAIN ANALYZE SELECT * FROM datachunks WHERE (data->>'system'='68f56dbbcfd8140022193954' and data->>'bucket'='68f87a1c8989ef0021071906' and (data->>'dedup_key' IN ('6TIeNKXnXpSlby+G1Sc//W9XexkJ0zmJf3beiXAik75e4Ep7EdfJ/tb9kL0aESBX') and data ? 'dedup_key') and (data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb)) ORDER BY data->>'_id' DESC;
                                                                                                                        QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=287.04..287.04 rows=1 width=732) (actual time=73.327..73.350 rows=158 loops=1)
   Sort Key: ((data ->> '_id'::text)) DESC
   Sort Method: quicksort  Memory: 145kB
   ->  Bitmap Heap Scan on datachunks  (cost=268.89..287.03 rows=1 width=732) (actual time=72.440..73.149 rows=158 loops=1)
         Recheck Cond: (((data ->> 'dedup_key'::text) = '6TIeNKXnXpSlby+G1Sc//W9XexkJ0zmJf3beiXAik75e4Ep7EdfJ/tb9kL0aESBX'::text) AND (data ? 'dedup_key'::text) AND (((data -> 'deleted'::text) IS NULL) OR ((data -> 'deleted'::text) = 'null'::jsonb)))
         Filter: (((data ->> 'system'::text) = '68f56dbbcfd8140022193954'::text) AND ((data ->> 'bucket'::text) = '68f87a1c8989ef0021071906'::text))
         Heap Blocks: exact=137
         ->  BitmapAnd  (cost=268.89..268.89 rows=16 width=0) (actual time=72.379..72.382 rows=0 loops=1)
               ->  Bitmap Index Scan on idx_btree_datachunks_dedup_key  (cost=0.00..31.43 rows=1641 width=0) (actual time=0.073..0.074 rows=158 loops=1)
                     Index Cond: ((data ->> 'dedup_key'::text) = '6TIeNKXnXpSlby+G1Sc//W9XexkJ0zmJf3beiXAik75e4Ep7EdfJ/tb9kL0aESBX'::text)
               ->  Bitmap Index Scan on idx_btree_datachunks_id_desc  (cost=0.00..237.21 rows=7537 width=0) (actual time=71.374..71.374 rows=341693 loops=1)
 Planning Time: 0.635 ms
 Execution Time: 73.444 ms
(13 rows)

Expected behavior

  1. We should modify the existing index or create a new one so that the query will not scan the entire DB.

Steps to reproduce

More information - Screenshots / Logs / Other output

When manually creating an index based also on system and bucket (as the query is), the query is using the index.

nbcore=# CREATE INDEX idx_btree_datachunks_dedup_key_new ON public.datachunks USING btree ((data ->> 'dedup_key'::text), (data ->> 'system'::text), ((data ->> 'bucket'::text))) WHERE (data ? 'dedup_key'::text AND (((data -> 'deleted'::text) IS NULL) OR ((data -> 'deleted'::text) = 'null'::jsonb)));
CREATE INDEX
nbcore=# EXPLAIN ANALYZE SELECT * FROM datachunks WHERE (data->>'system'='68f56dbbcfd8140022193954' and data->>'bucket'='68f87a1c8989ef0021071906' and (data->>'dedup_key' IN ('6TIeNKXnXpSlby+G1Sc//W9XexkJ0zmJf3beiXAik75e4Ep7EdfJ/tb9kL0aESBX') and data ? 'dedup_key') and (data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb)) ORDER BY data->>'_id' DESC;
                                                                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2.66..2.66 rows=1 width=732) (actual time=0.773..0.794 rows=158 loops=1)
   Sort Key: ((data ->> '_id'::text)) DESC
   Sort Method: quicksort  Memory: 145kB
   ->  Index Scan using idx_btree_datachunks_dedup_key_new on datachunks  (cost=0.42..2.65 rows=1 width=732) (actual time=0.102..0.606 rows=158 loops=1)
         Index Cond: (((data ->> 'dedup_key'::text) = '6TIeNKXnXpSlby+G1Sc//W9XexkJ0zmJf3beiXAik75e4Ep7EdfJ/tb9kL0aESBX'::text) AND ((data ->> 'system'::text) = '68f56dbbcfd8140022193954'::text) AND ((data ->> 'bucket'::text) = '68f87a1c8989ef0021071906'::text))
 Planning Time: 1.708 ms
 Execution Time: 0.896 ms
(7 rows)

A similar index should be created by modifying this definition to the following:

    {
        fields: {
            system: 1,
            bucket: 1,
            dedup_key: 1,
        },
        options: {
            unique: false,
            partialFilterExpression: {
                dedup_key: { $exists: true },
                deleted: null
            }
        }
    },

This was not tested. Also, we should check if the existing index is not used elsewhere.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions