[SVCS-426] Update googledrive provider to use googledrive v3 API#276
[SVCS-426] Update googledrive provider to use googledrive v3 API#276TomBaxter wants to merge 5 commits intoCenterForOpenScience:developfrom TomBaxter:feature/osf-426
Conversation
AddisonSchiller
left a comment
There was a problem hiding this comment.
Mostly style changes + a few questions to start.
AddisonSchiller
left a comment
There was a problem hiding this comment.
One very long line length change needed.
One moving of functions change.
after this, will be good for next phase
| NAME = 'googledrive' | ||
| BASE_URL = settings.BASE_URL | ||
| FOLDER_MIME_TYPE = 'application/vnd.google-apps.folder' | ||
| FILE_FIELDS = {'fields': 'version, id, name, size, modifiedTime, createdTime, mimeType, webViewLink, webContentLink, md5Checksum, capabilities(canDelete, canEdit, canTrash, canDownload, canRename, canReadRevisions, canShare, canCopy)'} |
| path=None) -> bool: | ||
| return self == other | ||
|
|
||
| def build_move_url(self, src_path, dest_path): |
There was a problem hiding this comment.
nitpick: you split up can_intra_move and can_intra_copy with all the build URL methods.
Move all the can_<do something> methods so they are all right by each other again (including can_duplicate_names
AddisonSchiller
left a comment
There was a problem hiding this comment.
Looking good. Ready for next phase.
|
@TomBaxter Let's sit down and do an in-person review for this. |
cslzchen
left a comment
There was a problem hiding this comment.
A quick look view a few questions. Let's discuss in person @TomBaxter 🎆
| def etag(self): | ||
| return self.raw['etag'] | ||
| # Google Doc revision representations do not return etag | ||
| return '{}::{}'.format(self.raw['id'], self.raw['modifiedTime']) |
There was a problem hiding this comment.
How do we use etag or its alternative?
etag tracks both file and metadata change while modifiedTime only tracks file content change. Can this be a problem? Is it possible to still use v2 API for etag?
It is possible for file have the different modifiedTime but the same hash.
There was a problem hiding this comment.
Agreed, I brought this up with @felliott at the time of writing PR. Can't remember the specifics of the conversation though. Let's revisit with him. Not sure which is the less of two evils. Sub par etag or staying on V2.
There was a problem hiding this comment.
More question for @felliott: What do we use etag for?
| 'id': dest_path.parent.identifier | ||
| }], | ||
| 'title': dest_path.name | ||
| 'name': dest_path.name |
There was a problem hiding this comment.
Is dropping 'parents' intentional?
There was a problem hiding this comment.
For myself, double check where parents is retrieved.
| is_docs_file = drive_utils.is_docs_file(metadata.raw) | ||
| # GoogleDrive v3 API does not allow the download of GoogleDoc revisions | ||
| # Use v2 API for this functionality | ||
| if revision and is_docs_file: |
There was a problem hiding this comment.
👍
The if..elif...else... makes the logic much clearer than metadata.raw.get('downloadUrl') or drive_utils.get_export_link(metadata.raw). I used to have a hard time understanding the how it works.
There was a problem hiding this comment.
Added comments to make the meaning of the four conditions more clear.
| # gdrive doesn't support intra-copy on folders | ||
| return self == other and (path and path.is_file) | ||
|
|
||
| def build_move_url(self, src_path, dest_path): |
There was a problem hiding this comment.
Too many methods here. It is not intuitive to use. Let's discuss if we can make only two util methods: build_v3_url() and build_v2_url()? The actions are parsed as arguments.
There was a problem hiding this comment.
Yes, I made them separate for ease of use in tests. Have now re-factored them all out. Complete.
| # GoogleDrive v3 API does not allow the download of GoogleDoc revisions | ||
| # Use v2 API for this functionality | ||
| if revision and is_docs_file: | ||
| meta_url = self.build_url('files', path.identifier, 'revisions', revision) |
There was a problem hiding this comment.
As mentioned before, let's make a build_url_v2().
There was a problem hiding this comment.
I'm happy to do this, however, I was holding off creating a new method for logic that will only be run in one place and likely removed entirely in the future. If it turns out that we want to use v2 for retrieving etags then it might make more sense to me to make build_url_v2 it's own thing.
| meta_url = self.build_url('files', path.identifier, 'revisions', revision) | ||
| meta_url = meta_url.replace('/v3/', '/v2/', 1) | ||
|
|
||
| async with self.request( |
There was a problem hiding this comment.
There is some difference between the old and the new logic. Let's discuss.
| return { | ||
| 'parents': [ | ||
| { | ||
| 'kind': 'drive#parentReference', |
There was a problem hiding this comment.
Is this change related to V3 upgrade?
There was a problem hiding this comment.
I'm not sure why it was coded that way previously. I don't believe there was a change in the API. And I don't believe it was coded correctly originally.
| }, | ||
| ] | ||
| DOCS_MIMES = [format['mime_type'] for format in DOCS_FORMATS] | ||
| DOCS_DEFAULT_FORMAT = { |
There was a problem hiding this comment.
Should we add 'mime_type': None to the default one?
There was a problem hiding this comment.
Checked code. mime_type is only referenced after a check for is_google_doc , so adding mime_type = '' for consistency, though it currently will never be referenced.
| resp = await self.make_request( | ||
| 'GET', | ||
| self.build_url('files', | ||
| q="'{}' in parents".format(file_id), |
There was a problem hiding this comment.
We don't have trashed = false before?
There was a problem hiding this comment.
Prevents returning deleted files.
| self.metrics.add('_file_metadata.user_capabilities', data['capabilities']) | ||
| if drive_utils.is_docs_file(data): | ||
| if can_access_revisions: | ||
| if data['capabilities'].get('canReadRevisions', None) and data['capabilities']['canReadRevisions']: |
There was a problem hiding this comment.
How about this?
if data.get('capabilities', {}).get('canReadRevisions', None):There was a problem hiding this comment.
These do the same thing, but yours is shorter. Change made.
cslzchen
left a comment
There was a problem hiding this comment.
Thanks for the updates. Looks good to me. I will test it fully locally (and make a PR with minor changes if necessary).
[SVCS-426 ] GoogleDrive migration guide https://developers.google.com/drive/v3/web/migration Of particular note: GD v3 API has no method of downloading GoogleDoc revisions. This PR leaves behind GD v2 calls, in order to maintain this functionality as long as possible. GD v3 returns minimal representations of resources. Fields must be specified, to be returned. Many field names changed. Most commonly in the provider: title -> name modifiedDate -> modifiedTime fileSize -> size etags are no longer available from GoogleDrive exportLinks are no longer available in the 'file' representation of GoogleDoc files. GD v3 returns lists of resources as either 'files' or 'revisions' as opposed to GD v2 which returned 'items' for all resource lists.
|
Update: secondary PR https://github.com/TomBaxter/waterbutler/pull/1 pending. |
[SVCS-426] Style Refactor and Minor Code Change
|
Update: a second secondary PR https://github.com/TomBaxter/waterbutler/pull/2 pending. |
- gdrive metadata and provider tests
[SVCS-426] Update Style and Trivial Code Change for Tests
|
|
||
| # Use dummy ID if no revisions found | ||
| metadata = await self.metadata(path, raw=True) | ||
| # TODO: considering keep using v2 to obtain etag |
There was a problem hiding this comment.
Phase 2: etag is no longer available through v3, but we can still use v2.\
There was a problem hiding this comment.
Phase 2: please be ware that using id, modifiedTime (and even with md5Checksum) is not equivalent to etag. Here is a useful discussion I found: https://stackoverflow.com/questions/42174600/alternative-for-etag-field-in-google-drive-v3.
Ticket
SVCS-426
Purpose
Update GoogleDrive provider to v3 of the GoogleDrive API
Changes
Substantial changes to all aspects of the provider.
Of particular note:
GD v3 API has no method of downloading GoogleDoc revisions.
This PR leaves behind GD v2 calls, in order to maintain this
functionality as long as possible.
GD v3 returns minimal representations of resources. Fields must be
specified, to be returned.
Many field names changed. Most commonly in the provider:
title -> name
modifiedDate -> modifiedTime
fileSize -> size
etags are no longer available from GoogleDrive
exportLinks are no longer available in the 'file' representation of
GoogleDoc files.
GD v3 returns lists of resources as either 'files' or 'revisions' as
opposed to GD v2 which returned 'items' for all resource lists.
Side effects
None expected
QA Notes
This provider will need a full test of all functionality.
Deployment Notes
We will need to keep an eye on 'End of Life' announcements for GoogleDrive v2 API. As we have kept v2 call for GoogleDocs revisions downloads.