ClickHouse version of import users script#1317
Conversation
|
still need to test changes here |
|
Did some testing. Further changes:
|
| added_user_rows.append([user_email, user_name, user.enabled]) | ||
| added_authority_rows += [[user_email, authority] for authority in user.authorities] | ||
| if added_user_rows: | ||
| ch_client.insert('users', added_user_rows, column_names=['email', 'name', 'enabled']) |
There was a problem hiding this comment.
let's add error checking like we were doing before for SQL here?
There was a problem hiding this comment.
Yes .. if the insert fails I wonder what happens. I suppose ch_client will throw an exception? At the very least we should confirm that if this happens we will get a stacktrace or some kind of indication in the run logs. A notice to our slack channel (letting us know that we failed an attempt to add a user to the user table) should probably also be made.
There was a problem hiding this comment.
I added back code that will email the failure message to users if there is an exception thrown during inserts.
averyniceday
left a comment
There was a problem hiding this comment.
General changes look good - couple suggestions /questions regarding readiability and organization
sheridancbio
left a comment
There was a problem hiding this comment.
I breezed over the parts which seemed to be clear language conversions to accommodate python 3 syntax. So long as this works in production I approve. I suppose the inherent retry is still present because each time this runs we re-compute the list of users who are missing from the database and try to import them again.
| added_user_rows.append([user_email, user_name, user.enabled]) | ||
| added_authority_rows += [[user_email, authority] for authority in user.authorities] | ||
| if added_user_rows: | ||
| ch_client.insert('users', added_user_rows, column_names=['email', 'name', 'enabled']) |
There was a problem hiding this comment.
Yes .. if the insert fails I wonder what happens. I suppose ch_client will throw an exception? At the very least we should confirm that if this happens we will get a stacktrace or some kind of indication in the run logs. A notice to our slack channel (letting us know that we failed an attempt to add a user to the user table) should probably also be made.
|
@sheridancbio I'm wary of adding code for Slack notifications -- if something goes wrong with the network for example, we will be getting pinged every minute by this script. I did improve the error handling for failed DB operations however |
|
@copilot resolve the merge conflicts in this pull request |
|
We decided that we will be leaving the behavior on failure as-is for now but we should probably address it in the future so that users don't get spammed with emails if something goes wrong with our networking, for example. |
7402508 to
239c95e
Compare
| SMTP_SERVER=`cat $MAIL_SMTP_SERVER` | ||
| GENIE_BLUE_DATABASE_PROPERTIES_FILENAME="portal.properties.genie.blue" | ||
| GENIE_GREEN_DATABASE_PROPERTIES_FILENAME="portal.properties.genie.green" | ||
| GENIE_BLUE_CLICKHOUSE_CLIENT_CONFIG_FILEPATH="/data/portal-cron/pipelines-credentials/clickhouse_client_genie_blue_config.yaml" |
There was a problem hiding this comment.
removed these lines as we're no longer using the clickhouse CLI
averyniceday
left a comment
There was a problem hiding this comment.
Overall looks good to me - only one change -- let's just replace the contents of the existing importUsers.py script instead of having a separate importUsersClickhouse.py script. That way we do cleanup as part of this rollout, thanks!
|
Overall, I'm good with merging this PR. But one thing I'm not sure I understand ... why did we switch from using the command line executable "clickhouse client" and switch to using the python third party library "clickhouse_connect"? I'm not against this ... but it means that now we have some parts of our code using |
Summary of changes
users/authoritiestable in a single batched SQL transaction--ssl-caflag which no longer applies (this was only relevant for RDS)The portal.properties files also need to be updated in order to include clickhouse creds -- will submit a PR to pipelines-configuration later
cc @averyniceday @sheridancbio