[SVCS-334]Adding custom sentry sanitizer#297
[SVCS-334]Adding custom sentry sanitizer#297AddisonSchiller wants to merge 3 commits intoCenterForOpenScience:developfrom
Conversation
cslzchen
left a comment
There was a problem hiding this comment.
As discussed, need recursion for dict type and a question on UTF-8.
| MASK = '*' * 8 | ||
|
|
||
| # Token and key added from original. Key is used by Dataverse | ||
| FIELDS = frozenset([ |
There was a problem hiding this comment.
For phase 2 consideration: double check the list.
There was a problem hiding this comment.
@TomBaxter My suggestion is check if there are other possible keys that may store sensitive or secret information. Here is the default keys from SanitizePasswordProcessor: https://github.com/htquach/raven-python/blob/master/raven/processors.py#L72.
FIELDS = frozenset([
'password', 'secret', 'passwd', 'authorization', 'api_key', 'apikey',
'pw', 'cred'
])We don't have 'pw', 'cred' and we added 'access_token', 'token', 'key' and 'sentry_dsn'. Other possible ones are 'refresh_token', 'code', etc. We need to understand what we need and what we don't.
There was a problem hiding this comment.
Understood. I plan on making better use of inheritance so that we don't miss any defaults in the parent. I'll also check the sentry data for Dataverse, for other possible things to sanitize. I'll definitely add refresh_token.
| if isinstance(value, str) and self.VALUES_RE.search(value): | ||
| return self.MASK | ||
|
|
||
| if isinstance(value, dict): |
There was a problem hiding this comment.
Need recursion for dict + a test fot it.
There was a problem hiding this comment.
Complete, for lists as well
| # Just in case we have bytes here, we want to turn them into text | ||
| # properly without failing so we can perform our check. | ||
| if isinstance(key, bytes): | ||
| key = key.decode('utf-8', 'replace') |
There was a problem hiding this comment.
Is it guaranteed to be correctly UTF-8 encoded?
There was a problem hiding this comment.
Unsure. This line is directly from raven, so I assume so.
There was a problem hiding this comment.
I checked the code base and it turns out we rarely try en/decoding. Let's leave it this way. In the future we may want to have a cleanup ticket that add try...except wrapper on this.
There was a problem hiding this comment.
ill add a comment
0aae5df to
fb273ea
Compare
56d7c6b to
b776bc8
Compare
Johnetordoff
left a comment
There was a problem hiding this comment.
Just an import order problem and some debugging stuff to get rid of.
| @@ -0,0 +1,196 @@ | |||
| import pytest | |||
| from unittest import mock | |||
There was a problem hiding this comment.
unittest is a built-in library, import order.
tests/server/test_sanitize.py
Outdated
| result = sanitizer.sanitize('sanitize_dict', sanitize_dict) | ||
|
|
||
| # Sanity check | ||
| assert result != { |
There was a problem hiding this comment.
Not necessary given the next the statement.
1 similar comment
| debug=debug, | ||
| ) | ||
| app.sentry_client = AsyncSentryClient(settings.SENTRY_DSN, release=waterbutler.__version__) | ||
| app.sentry_client = AsyncSentryClient(settings.SENTRY_DSN, release=waterbutler.__version__, |
There was a problem hiding this comment.
Need to update this after bumping Update: ignore this commentsetuptools to 37.0.0.
There was a problem hiding this comment.
@TomBaxter Thanks for catching this. I probably was thinking about MFR. No need to change.
| MASK = '*' * 8 | ||
|
|
||
| # Token and key added from original. Key is used by Dataverse | ||
| FIELDS = frozenset([ |
| debug=debug, | ||
| ) | ||
| app.sentry_client = AsyncSentryClient(settings.SENTRY_DSN, release=waterbutler.__version__) | ||
| app.sentry_client = AsyncSentryClient(settings.SENTRY_DSN, release=waterbutler.__version__, |
|
|
||
| key = key.lower() | ||
| for field in self.FIELDS: | ||
| if field in key: |
There was a problem hiding this comment.
Check key directly from self.FIELDS
|
|
||
|
|
||
| class WBSanitizer(SanitizePasswordsProcessor): | ||
| """Asterisk out things that look like passwords, keys, etc.""" |
There was a problem hiding this comment.
Make better use of parent class.
|
As mentioned, this PR is closed and continues in #313. |
Ticket
https://openscience.atlassian.net/browse/SVCS-334
Purpose
Sanitize and censor tokens we get back from Dataverse that show up in local variables
Changes
Added a sanitizer based that inherits from the default raven one. It has increased functionality to parse through dictionaries, look for dataverse keys, and a larger scope of variable names to sanitize (key, token)
NOTE:
sanitize.pyandtest_sanitize.py's location in waterbutler are pretty variable. I put them in server just because that was the easiest place at the time. if they should live somewhere else, that is an easy change, and something that might need to be considered.Side effects
Some sentry things that we don't want to sanitize may end up being sanitized.
QA Notes
To test, you need to trigger an error that will log a Dataverse API token in a local variable or request some how.
An easy way to do this is if locally testing, upgrade your furl to 1.0.1 in Waterbutler with Dataverse attached to your project. You will also need to add a personal sentry account to your Waterbutler (through your raven.config file, or by manually putting it in your settings).
One thing to note: If an API key shows up in the actual error message, this is currently not censored (not sure best way to go about this, but the only time I ever encountered this was with the error :
Bad API token.. so the token is not valid anymore anyway.To trigger the above "Bad API token" functionality, attach a dataverse account. Then on dataverse, go refresh your token and refresh the page. This error will trigger until the OSF has to revalidate (Or something like that. This only happened to me once or twice).
There are other manual ways to test locally, such as raising errors in odd places with variables named things like 'key' or having a variable value that looks like a dataverse token etc.
Deployment Notes