Skip to content

Conversation

@tsafin
Copy link
Contributor

@tsafin tsafin commented Nov 26, 2022

  • Check expected results of loading of dbgen data, fail whole procedure if we have missed some rows;
  • Extend fiber slice time as require 'fiber'.set_max_slice(60 * 60) to not abort data population procedure;

P.S.

  • Discovered that there was no active CI connected with PR to this repository - so added the simple patch
    to run GH action on pull_request instead of push event.

As we discovered lately the harder way we may
end up with incomplete TPC-H dataset loading (due
to various changes in core ehaviour), so now
we introduce careful checking that all we expected to
load have been committed to the space populated, i.e.

```
Loading:        region
.... 5 rows loaded
Loading:        nation
.... 25 rows loaded
Loading:        part
.... 40000 rows loaded
Loading:        supplier
.... 2000 rows loaded
Loading:        partsupp
.... 160000 rows loaded
Loading:        customer
.... 30000 rows loaded
Loading:        orders
.... 300000 rows loaded
Loading:        lineitem
.... 1199969 rows loaded
```

- during load we count the number of loaded strings
- and this number is checked afterward against the number
  returned from `select count(*) from T`
We have changed behaviour of a long queries in the
Tarantool master since tarantool#6085 -
long transactions may abort if we have exceeded hardlimit
for a slice (1.0 sec).

Use `require 'fiber'.set_max_slice(60 * 60)` to make sure
we have not aborted loading of any row in TPC-H dataset.
@tsafin tsafin requested a review from ImeevMA November 26, 2022 20:06
Run CI on pull-request update, instead of
drect push to repository.

Added libunwind-dev to the dependencies list.
Copy link
Contributor

@ImeevMA ImeevMA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thank you for the patches! I have a few comments. Also, please check your commit messages for spelling errors. And, I think the commit messages will look better if you increase the maximum line length to 72 characters (except for the header, which has a maximum length of 50 characters).

local rc, msg = pcall(box.space[sql_table].insert, box.space[sql_table], tuple)
if not rc then
print(('database %s failed to insert tuple %s: %s'):
format(sql_table, require'json'.encode(tuple), msg))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines seem rather strange to me, since something is wrong with the indentation, and it seems strange to me to move the method to the next line. It might be worth changing it to something like:

            local str = 'database %s failed to insert tuple %s: %s'
            print(str:format(sql_table, require'json'.encode(tuple), msg))

end
f:close()
box.commit()
-- sanity check - assume we have successfully inserted all tuples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we throw here an error if not all tuples were inserted successfully? Also, is there a way to add tests to TPCH?

-- sanity check - assume we have successfully inserted all tuples
local check_lines = box.execute(('select count(*) from %s'):format(sql_table))
if not check_lines then
error(('database %s is not existing'):format(sql_table))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message looks rather strange. Can we change it to database %s does not exist?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants