- 
                Notifications
    You must be signed in to change notification settings 
- Fork 619
[15721] Compiled Catalog Query Access #1339
base: master
Are you sure you want to change the base?
Conversation
add in predicate fix enable debug enable debug logger use seq scan for get table test compile fix unused write table object constructor fix bug query execute fix bug fix bug fix bug fix query param print wrapped tuple fix bug cache cache fix try fix shared pointer clean up performance test GetDatabaseObject column_catalog.pp/h database_catalog fix seqscan fix db_catalog db_catalog fix bug db_catalog column_catalog.cpp/h index catalog lan catalog column stat lang catalog settings catalog trigger catalog zone map catalog proc_catolog.cpp/h delete code clean up db and trigger cleanup settings catalog delete code restore catalog test format fix include format fix bug in scan plan by prashanth add index test fixed binding for tuple_value_expression, and changed query_cache_test added Insert and Delete with Compiled Query in abstract_catalog and table_catalog compiled seq plan for table catalog by looking up table_oid fix trigger changed trigger_test, changed the wrong assumption that triggers are in a certain order fix settings catalog query metrics catalog query metrics catalog Changed zone_map_catalog, having issue running zone_map_scan_test using expressionPtr Edited cloumn_catalog, index_catalog, proc_catalog, settings_catalog Added Insert and Delete with Compiled Query Fixed Binding for TupleValueExpression database catalog insert database catalog bound index metrics catalog insert query history catalog table metrics insert database catalog delete modify catalog inserts change to complied insert plan intex metrics catalog delete table metrics catalog delete update catalog to use compiled delete plan fixed code review addressed issue trigger catalog bound table catalog bound language catalog bound added insert, delete and bouding for zone_map_catalog clean up deleted redundant comments in index_catalog index cache uncomment fix proc catalog fix proc catalog added bind in language_catalog's delete add comment to zone map manager
* Adding new mapping table * Revert "Fixing non-unique key insert problem" This reverts commit 4267752. * Revert LOG_INFO to LOG_TRACE * Fix segment fault problem by moving munmap() to after ~EpochManager() * Avoid compiler error * Enhance log message for mmap()'ed mapping table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure header files newly added are necessary. And optimizations can be made in the engineering side.
        
          
                src/catalog/abstract_catalog.cpp
              
                Outdated
          
        
      | * @param insert_values tuples to be inserted | ||
| * @param txn TransactionContext | ||
| * @return Whether insertion is Successful | ||
| */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned last time, move comments to header files.
        
          
                src/catalog/abstract_catalog.cpp
              
                Outdated
          
        
      | * @param predicate Predicate used in the seq scan | ||
| * @param txn TransactionContext | ||
| * @return Whether deletion is Successful | ||
| */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, move comments to header files.
        
          
                src/catalog/abstract_catalog.cpp
              
                Outdated
          
        
      | * @param txn TransactionContext | ||
| * | ||
| * @return Unique pointer of vector of logical tiles | ||
| */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move comments to header files.
        
          
                src/catalog/abstract_catalog.cpp
              
                Outdated
          
        
      | * @param column_offsets columns used for seq scan | ||
| * @param predicate Predicate used in the seq scan | ||
| * @return true if successfully executes | ||
| */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move comments to header files.
        
          
                src/catalog/column_catalog.cpp
              
                Outdated
          
        
      | auto constant_expr_7 = new expression::ConstantValueExpression( | ||
| val7); | ||
|  | ||
| tuples.push_back(std::vector<ExpressionPtr>()); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use emplace_back instead of push_back as the former would construct the object immediately.
        
          
                src/catalog/settings_catalog.cpp
              
                Outdated
          
        
      | config_value = (*result_tiles)[0]->GetValue(0, 0).ToString(); | ||
| } | ||
| PELOTON_ASSERT(result_tuples.size() <= 1); | ||
| if (result_tuples.size() != 0) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use empty method to check for emptiness.
        
          
                src/catalog/settings_catalog.cpp
              
                Outdated
          
        
      | config_value = (*result_tiles)[0]->GetValue(0, 0).ToString(); | ||
| } | ||
| PELOTON_ASSERT(result_tuples.size() <= 1); | ||
| if (result_tuples.size() != 0) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use empty method to check for emptiness.
        
          
                test/catalog/catalog_test.cpp
              
                Outdated
          
        
      | // EXPECT_EQ(nullptr, table_object_1); | ||
| // txn_manager.CommitTransaction(txn); | ||
| //} | ||
| // | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have problems passing this test? Are you going to uncomment it later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the test we create for #1336 . Now it has been fixed.
| #include "storage/storage_manager.h" | ||
| #include "type/ephemeral_pool.h" | ||
| #include "sql/testing_sql_util.h" | ||
| #include "common/timer.h" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to include this header file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed for performance testing. We plan to remove it later.
| // 1: tgrelid : table_oid | ||
| // 2: tgname : trigger_name | ||
| // 3: tgfoid : function_oid | ||
| // 3: tgfoid : function_name | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should tgfoid be changed if function_oid was meant to be function_name?
| codegen::BufferingConsumer buffer{{}, context}; | ||
|  | ||
|  | ||
| bool cached; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to define cached below where it's first assigned to a value.
|  | ||
| codegen::BufferingConsumer buffer{column_offsets, context}; | ||
|  | ||
| bool cached; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the cached definition to where it's first assigned.
|  | ||
| // Create consumer | ||
| codegen::BufferingConsumer buffer{column_offsets, scan_context}; | ||
| bool cached; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the cached definition to where it's first assigned.
|  | ||
| codegen::BufferingConsumer buffer{column_offsets, context}; | ||
|  | ||
| bool cached; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the cached definition to where it's first assigned.
|  | ||
| // execute SELECT a FROM table where a == 40; | ||
| codegen::BufferingConsumer buffer_1{{0, 1}, context_1}; | ||
| bool cached; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the cached definition to where it's first assigned.
        
          
                src/catalog/column_catalog.cpp
              
                Outdated
          
        
      | const std::vector<Constraint> &constraints, | ||
| type::AbstractPool *pool, | ||
| concurrency::TransactionContext *txn) { | ||
| (void) pool; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could probably use the "UNUSED ATTRIBUTE" specifier instead.
The same applies to a bunch of other places where "(void) attr" has been used to pacify the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why we would want to keep the parameter if we are not using it at all.
| val11)); | ||
| values.emplace_back(new expression::ConstantValueExpression( | ||
| val12)); | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, the multiple mallocs invocations can be coalesced using an array of objects.
|  | ||
| auto *table_oid_expr = | ||
| new expression::TupleValueExpression(type::TypeId::INTEGER, 0, | ||
| ColumnId::TABLE_OID); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the allocated memory has been freed. Irrespective, you should probably use smart pointers instead of working with raw pointers. The same applies to a bunch of other places in the code where raw pointers could be replaced with smart pointers.
| most_common_freqs = tuple.GetValue(ColumnId::MOST_COMMON_FREQS); | ||
| hist_bounds = tuple.GetValue(ColumnId::HISTOGRAM_BOUNDS); | ||
| column_name = tuple.GetValue(ColumnId::COLUMN_NAME); | ||
| has_index = tuple.GetValue(ColumnId::HAS_INDEX); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pooja mentioned in the last meeting that tuple.GetValue() is going to deprecated. However, I vaguely remember her mentioning that it could still be used with catalog tables. You might want to talk to her/Andy about this.
| */ | ||
| void InitMappingTable() { | ||
| mapping_table = (std::atomic<const BaseNode *> *) \ | ||
| mmap(NULL, 1024 * 1024 * 1024, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 
The value (1024 x 1024 x 1024) shouldn't be hardcoded (the MAPPING_TABLE_SIZE is #defined to 1MB). 
- 
InitMappingTable() seems to be invoked from the bwtree constructor - is 1GB mmaped for every index created? If so, this seems like an overkill. 
| // NOTE: Only unmap memory here because we need to access the mapping | ||
| // table in the above routine. If it was unmapped in ~BwTree() then this | ||
| // function will invoke illegal memory access | ||
| int munmap_ret = munmap(tree_p->mapping_table, 1024 * 1024 * 1024); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, a hard-coded value shouldn't be used.
| private: | ||
| ColumnStatsCatalog(concurrency::TransactionContext *txn); | ||
| std::vector<oid_t> all_column_ids = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the purpose of all_column_ids is to use it with a copy constructor to construct a vector with all the column oid's.
Firstly, I don't think you need to declare an all_column_ids vector, it can be easily constructed explicitly at runtime. Moreover hardcoding the values it is not a good idea since if anyone changes the schema of the catalog table, he/she must also remember to update the vector declaration.
A better way to construct this at runtime without breaking abstractions is by using the "NumColumns" method and filling the vector using std::iota.
The same applies to every declaration of all_columns_ids and column_ids.
| return default_value.get(); | ||
| } | ||
| type::Value *getDefaultValue() { return default_value.get(); } | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably be made inline.
| TYPE_OFF = 2 | ||
| }; | ||
| enum ZoneMapOffset { MINIMUM_OFF = 0, MAXIMUM_OFF = 1, TYPE_OFF = 2 }; | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why a strongly typed enum has been replaced by an old-style enum. It's better to stick to stongly typed enums.
The same applies to a bunch of places where old-style enums have been used.
| val6); | ||
| auto constant_expr_7 = new expression::ConstantValueExpression( | ||
| val7); | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the number of objects to be allocated is known at compile time. Multiple malloc invocations can be replaced by a single invocation by allocating an array of objects with parameterized constructor.
eg: arr_constant_expr = new expression::ConstantValueExpression[8]{{val0},{val1}...{val7}}
This same applies to a bunch of other places where the number of objects to be allocated is known at compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, the code looks clean, other than some minor changes. Good job on including the doxygen comments.
|  | ||
| // search for query | ||
| codegen::Query *query = codegen::QueryCache::Instance().Find(insert_plan);; | ||
| std::unique_ptr<codegen::Query> compiled_query(nullptr); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This object compiled_query is never used if the query is cached. Why create it every time? Can be moved inside the if block
| query->Execute(std::move(executor_context), buffer, | ||
| [&ret](executor::ExecutionResult result) { ret = result; }); | ||
|  | ||
| return ret.m_result == peloton::ResultType::SUCCESS; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep the query in our cache even if it fails?
I am not sure if this is an overkill or even correct but we could probably keep only the compiled queries that return success in our cache
        
          
                src/catalog/abstract_catalog.cpp
              
                Outdated
          
        
      | new executor::ExecutorContext(txn, std::move(parameters))); | ||
|  | ||
| // search for query | ||
| codegen::Query *query = codegen::QueryCache::Instance().Find(insert_plan);; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably make this a function and reuse the code instead of re writing it for every other function. Something like GetCompiledQuery() which searches for the query in the cache and if not found, compiles it and adds it to the cache.
We can even make it a macro or an inline function if it is on the critical path and we want to save a function call
|  | ||
| size_t column_count = catalog_table_->GetSchema()->GetColumnCount(); | ||
| for (size_t col_itr = 0; col_itr < column_count; col_itr++) { | ||
| // Skip any column for update | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this comment means here. Could you please help me understand what this means
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is only looking for the columns in the tuple which need to be updated. For more detailed explanation you can turn to @mengranwo
| : table_oid(tile->GetValue(tupleId, ColumnCatalog::ColumnId::TABLE_OID) | ||
|  | ||
| ColumnCatalogObject::ColumnCatalogObject(codegen::WrappedTuple wrapped_tuple) | ||
| : table_oid(wrapped_tuple.GetValue(ColumnCatalog::ColumnId::TABLE_OID) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned by @gandeevan, GetValue is deprecated now. Please talk to @poojanilangekar to make sure you can use this for a wrapped tuple object safely
| tuple->SetValue(ColumnId::COLUMN_NAME, val_column_name, pool); | ||
| tuple->SetValue(ColumnId::HAS_INDEX, val_has_index, nullptr); | ||
| tuples.emplace_back(); | ||
| // tuples.push_back(std::vector<ExpressionPtr>()); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead code
        
          
                src/catalog/language_catalog.cpp
              
                Outdated
          
        
      | LanguageCatalogObject::LanguageCatalogObject(executor::LogicalTile *tuple) | ||
| : lang_oid_(tuple->GetValue(0, 0).GetAs<oid_t>()), | ||
| lang_name_(tuple->GetValue(0, 1).GetAs<const char *>()) {} | ||
| LanguageCatalogObject::LanguageCatalogObject(codegen::WrappedTuple tuple) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to make a copy of the WrappedTuple parameter? Wouldn't using a reference suffice?
        
          
                src/catalog/proc_catalog.cpp
              
                Outdated
          
        
      | #define PROC_CATALOG_NAME "pg_proc" | ||
|  | ||
| ProcCatalogObject::ProcCatalogObject(executor::LogicalTile *tile, | ||
| ProcCatalogObject::ProcCatalogObject(codegen::WrappedTuple wrapped_tuple, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. Is a reference not good enough? This applies to all such uses in other constructors as well
| GetResultWithCompiledSeqScan(column_ids, predicate, txn); | ||
|  | ||
| // carefull! the result could be null! | ||
| LOG_INFO("size of the result tiles = %lu", result_tuples.size()); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed? Should it be LOG_INFO?
| class ColumnCatalogObject { | ||
| public: | ||
| ColumnCatalogObject(executor::LogicalTile *tile, int tupleId = 0); | ||
| ColumnCatalogObject(codegen::WrappedTuple wrapped_tuple); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a reference instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the test cases fail when running make check
The following tests FAILED:
52 - stats_test (Failed)
54 - zone_map_scan_test (Failed)
Errors while running CTest
make[3]: *** [test/CMakeFiles/check] Error 8
make[2]: *** [test/CMakeFiles/check.dir/all] Error 2
make[1]: *** [test/CMakeFiles/check.dir/rule] Error 2
make: *** [check] Error 2
akanjani@dev3:~/review/peloton/build$ git status
On branch pcq-cp2
Your branch is up-to-date with 'origin/pcq-cp2'.
nothing to commit, working directory clean
akanjani@dev3:~/review/peloton/build$
1. valgrind memory leak during fillepredicatearray 2. uninitialized Value being moved/destroyed - this has been solved 3. Concurrency issue, need to put an issue on this
…alue to pass by reference. Also add UNUSED_ATTRIBUTE instead of (void).
| I am reviving this PR. We will need this when we get rid of the interpreted engine. It also has bug fix for #1362 | 
We enabled complied catalog lookup in the first pull request. Based upon this, we further support complied insert and delete queries for all catalogs. We also fix bugs for the following issues:
QueryCache Bug #1298 Need to manually bind the tuple value expression so that equality checks will work correctly.
Sequential scan assumes that the output columns should start at offset 0 otherwise PerformBinding will not correctly find the corresponding column attributes.
ZoneMapCatalog needs special care to avoid chicken and egg problem. Each sequential plan needs to check the zone map to know whether to scan the tile or not, but checking the zone map requires ZoneMap catalog access which leads to an infinity loop. So we ensure in the ZoneMap catalog manager that if the scanning table is ZoneMap catalog then it will just scan it without checking the ZoneMap