-
Notifications
You must be signed in to change notification settings - Fork 531
AWS v2 SDK for S3 (v1 EOL soon) #11360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS v2 SDK for S3 (v1 EOL soon) #11360
Conversation
|
I have done some quick test on our Rados Ceph storage (S3 compatible) and I can upload file only without direct upload. With direct upload, I have a CORS error on my browser. In addition, on (IQSS) develop branch, I don't have this error with same configuration. I'll look at the JS code another day and try to understand it. |
|
|
pdurbin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still testing (and waiting for Jenkins) but here's a bit more feedback.
|
|
||
| //then | ||
| assertEquals("<root/>", stringWriter.toString()); | ||
| assertEquals("<root></root>", stringWriter.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. Why did this behavior change? Should we add it to the release note and the API changelog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note - feel free to tweak or remove. As far as I can tell, the XML libraries we use have changed whether they serialize empty elements using self-closing tags or not. I have not found a way to configure them for the old behavior. Nominally the meaning of the XML is not changed - <root/> and <root></root> both mean there's an empty root element, so anyone parsing the XML shouldn't notice. (Our test is obviously just looking at the text as written.) So this is somewhat like a change from putting out JSON or pretty-printed JSON. I guess it's better to note it in the change log even if it shouldn't matter.
I do like this snarky stackoverflow response to someone wanting to configure this:
"Most serializers I know of do not allow you to choose whether or not to use empty element tags in the output, for the simple reason that no sane consumer of XML should care whether they are used or not. If you do care, and are not insane, it would help to explain why you care."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Classic. Thanks for adding a note.
| } | ||
|
|
||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just sticking this at the bottom of the "dataaccess" section...
I noticed a failure at edu.harvard.iq.dataverse.dataaccess.RemoteOverlayAccessIOTest.testRemoteOverlayFiles here: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-11360/22/testReport/edu.harvard.iq.dataverse.dataaccess/RemoteOverlayAccessIOTest/testRemoteOverlayFiles/
I merged the latest and pushed a small doc change to force another run. Let's keep an eye on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly github is no more reliable than our sites in terms of making sure a file on the web is always available. It's their 503 response that broke the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote this. I hope you think it's ok. Please feel free to revert or tweak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this glassfish file needed? Can a comment be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding a comment in 0e4e680.
pdurbin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. Thanks for addressing my feedback.
API tests are passing and took 19 minutes: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-11360/24/testReport/
I did very little testing myself but I did run S3AccessIT locally and it worked.
Off to QA.
|
this branch has conflicts - needs an update. |
|
Ready to merge this - just need branch conflicts resolved. Thanks! |
* update dvObject featured item response * fix doc * let doc writers know to pip install new requirements IQSS#10618 * improve release note IQSS#10618 * move file count limit from dvObject to dvObjectContainer * make use of "version" string in "making releases" doc IQSS#10618 * various tweaks to docs IQSS#10618 * typo IQSS#10618 * mention alpha tag is going away IQSS#10618 * clarify * IQSS#11542 get all type counts after first page * ci(ct): re-enable conditional workflows in upstream repository Removed temporary testing condition for `gdcc/wip-base-image` and reinstated the `if` condition to restrict workflow runs to the upstream `IQSS` repository. * docs(ct): add README for backports directory and usage guidelines Introduced documentation detailing the structure, purpose, and usage of the `src/backports` directory to support consistent patch management across multiple Dataverse release versions. * Added: getSuperuserOrAssigneeDataverseRolesFor method to RoleAssigneeServiceBean * Refactor: RoleAssigneeServiceBean error message extracted to Bundle string * Added: getAssignedRoles API endpoint * keep initial dataverse on invalid input * release not change according to code * adding datasetFileUploadsAvailable to response * fix flyway script name * fix replace * fix null pointer * fix superuser ignore limits * Added: tweaks and IT for getUserSelectableRoles * Added: IT cases for testGetUserSelectableRoles * Added: docs for roles/userSelectable * Added: docs for IQSS#11434 * add missing params to api doc * add checks for restricted files and unpublished dataverse and dataset * fix unit test * fix it test by piblishing dataverse and dataset * allow for limit of 0 to make dataset effectivly read only * fix unit test * fix(ct): don't push images with buildx we haven't build * fix(ct): backporting skip push tags to old releases * fix(ct): remove typos in patch file headers * add ui integration with file count limit * Update src/main/java/edu/harvard/iq/dataverse/util/json/JsonPrinter.java Co-authored-by: Guillermo Portas <[email protected]> * fix datafile not found error and added tests * add comments to new IT tests * build(deps): bump org.postgresql:postgresql in /modules/dataverse-parent Bumps [org.postgresql:postgresql](https://github.com/pgjdbc/pgjdbc) from 42.7.4 to 42.7.7. - [Release notes](https://github.com/pgjdbc/pgjdbc/releases) - [Changelog](https://github.com/pgjdbc/pgjdbc/blob/master/CHANGELOG.md) - [Commits](pgjdbc/pgjdbc@REL42.7.4...REL42.7.7) --- updated-dependencies: - dependency-name: org.postgresql:postgresql dependency-version: 42.7.7 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * add dvObject display name to json * add new tests * change flyway script from 6.6.02 to 6.6.01 * rename flyway script from 6.6.0.1 to 6.6.0.2 * Migration from forgemia.inra.fr to forge.inrae.fr * fix: add datasetType to JSON serialization of DatasetVersion THe reason to add this here is that some client tools might request specific versions of a dataset, but need the information about the type, too. For example some client might want to display a specific version of a software instead of a dataset differently. It is debatable if this should be included here, as so far there is no consensus yet wether or not the type of a dataset should be allowed to change between versions of it. On the other hand, the global ID should also never change, but it is included anyway for the sake of getting back to the dataset via the API if necessary. This should be revisited as a JSON API design issue at a later point, especially as when creating a dataset version, the type is at the root level and the version itself is at a sublevel, rendering the API inconsistent from the start. * Update apps.rst Added dataverse-metadata-crawler (https://github.com/scholarsportal/dataverse-metadata-crawler) to the API Guide/Apps for the documentation. * fix picky sphinx error * Create 11581-metadata-crawler.md * add crawler to reporting tools * use WSL not cmd.exe for Windows dev, simplify IQSS#10606 * add release note IQSS#10606 * fix links IQSS#11549 * IQSS#11542 update comments * re-work the deleteion of datafile featured items when publishing dataset * fix tests by setting publication date on datafile * fix typo in doc * changes per review * Apply suggestions from code review Co-authored-by: Copilot <[email protected]> * Changes per review * adding limit check on multi file upload * address code review feedback: add OPS, say "3.x" IQSS#11518 * add reference to LC mdb * remove unused imports, info->fine per review * improve release note IQSS#11360 * add note about the executor resource * add note about XML serialization changes * improve Search Services docs IQSS#11281 * Update doc/release-notes/TKLabels.md Co-authored-by: Omer Fahim <[email protected]> * Update doc/release-notes/TKLabels.md Co-authored-by: Philip Durbin <[email protected]> * fix after class name change * Update src/main/java/edu/harvard/iq/dataverse/search/SearchService.java Co-authored-by: Philip Durbin <[email protected]> * PIDs in addition to DOIs * add defaultService to API * static solr string * add settings to release note/docs * more name changes * test: add unit test for datasetType inclusion in DatasetVersion JSON serialization * docs: clarify dataset fields consistency across dataset versions in API guide IQSS#11573 Added a note explaining the constancy of `datasetId`, `datasetPersistentId`, and `datasetType` fields for all dataset versions in the API documentation. The first two have been around since v4.18, see also issue IQSS#6397 * partial update * fix broken page - name change, remove legacy section * exclude ext search classes * IQSS#11546 fix compare separator logic * use constant in more places * Remove the doc/sphinx-guides/HOWTO-SPHINX-INSTALL.txt file, because it is out-of-date and the updated information is part of the docs at doc/sphinx-guides/source/contributor/documentation.md. * Add a feature flag to optionally enable permission check in LC api * disable/don't load any non-solr search services in war * fix indenting IQSS#11281 * add comments to keep excludes in sync IQSS#11281 * tweak release note IQSS#11281 * finish windows instructions * fix list * minor tweaks IQSS#10606 * combine snippets (so far, with issue numbers) into 6.7 release notes IQSS#11367 * fix heading level * start reording, other tweaks * Add WSL preparation steps and package installation instructions * Update WSL installation instructions for package management and SDKMAN usage * Update WSL installation instructions to use powershell code blocks for package installation * Clarify WSL preparation steps and SDKMAN usage in documentation * minor clean-up * reordering and tweaking IQSS#11367 * more reordering and tweaking, add highlights IQSS#11367 * rework through end of API updates IQSS#11367 * edited up to upgrade steps IQSS#11367 * clean up upgrade instructions IQSS#11367 * update guides links to 6.7 IQSS#11367 * tweaks IQSS#11367 * mention MCP IQSS#11367 * add Rclone to list of integrations IQSS#11608 * make limit update a superuser only function * tweak highlights IQSS#11367 * rework video subtitles section IQSS#11367 * reword settings IQSS#11367 * curation labels updates, Solr indexing IQSS#11367 * various tweaks IQSS#11367 * add the rest of backward incompats IQSS#11367 * throw exception if non superuser changes the limit * throw exception if non superuser changes the limit * fix tests * consolidate creating DocumentBuilder * cleanup * missing import * handle XMLReader * update XMLInputStream, final closes * refactor to reduce dupe code * remove unused imports * tests * style issue * typo from earlier update * simple release note * add release note and link to announcement and PR IQSS#11608 * avoid parallel testing with cache * transformer, validator updates * avoid props/attr that don't exist in our versions * move Rclone from "getting data in" to "analysis and computation" IQSS#11608 * clarify Rclone support is read only IQSS#11608 * Update doc/release-notes/6.7-release-notes.md typo fix * Update doc/release-notes/6.7-release-notes.md typo fix * Update doc/release-notes/6.7-release-notes.md fixed typo * Update doc/release-notes/6.7-release-notes.md added a space * add windows dev to 6.7 releases notes IQSS#11367 * add configurable search services to 6.7 release notes IQSS#11367 * add Security Updates section IQSS#11367 * Updated the locally-built xoai jars with the cherry-picked commit securing b0ea5e5 XMLInputFactory there. * rebuilt the xoai jar files with a modified version, to make the project build on Jenkins without breaking all the other PR builds. * ... and the updated pom file to use the new local version * add xml parsing update IQSS#11367 * add rclone support to release notes IQSS#11367 * clarify that changes can be made later IQSS#11367 * simplify IQSS#11367 * one line per paragraph IQSS#11367 * add note about Solr dir for Docker users IQSS#11367 * Update doc/release-notes/6.7-release-notes.md typo fix * typo: fix URL Co-authored-by: Omer Fahim <[email protected]> * bump to 6.7 and update base.image.version IQSS#11371 * Update doc/release-notes/6.7-release-notes.md added word * switch base.image.version back to non-revision IQSS#11651 --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Steven Winship <[email protected]> Co-authored-by: Philip Durbin <[email protected]> Co-authored-by: Omer Fahim <[email protected]> Co-authored-by: Stephen Kraffmiller <[email protected]> Co-authored-by: Oliver Bertuch <[email protected]> Co-authored-by: GPortas <[email protected]> Co-authored-by: jo-pol <[email protected]> Co-authored-by: qqmyers <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Leonid Andreev <[email protected]> Co-authored-by: Steven Ferey <[email protected]> Co-authored-by: Ken Lui <[email protected]> Co-authored-by: qqmyers <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Jan van Mansum <[email protected]> Co-authored-by: Juan Pablo Tosca Villanueva <[email protected]>
* let doc writers know to pip install new requirements IQSS#10618 * improve release note IQSS#10618 * move file count limit from dvObject to dvObjectContainer * make use of "version" string in "making releases" doc IQSS#10618 * various tweaks to docs IQSS#10618 * typo IQSS#10618 * mention alpha tag is going away IQSS#10618 * clarify * IQSS#11542 get all type counts after first page * ci(ct): re-enable conditional workflows in upstream repository Removed temporary testing condition for `gdcc/wip-base-image` and reinstated the `if` condition to restrict workflow runs to the upstream `IQSS` repository. * docs(ct): add README for backports directory and usage guidelines Introduced documentation detailing the structure, purpose, and usage of the `src/backports` directory to support consistent patch management across multiple Dataverse release versions. * Added: getSuperuserOrAssigneeDataverseRolesFor method to RoleAssigneeServiceBean * Refactor: RoleAssigneeServiceBean error message extracted to Bundle string * Added: getAssignedRoles API endpoint * keep initial dataverse on invalid input * release not change according to code * adding datasetFileUploadsAvailable to response * fix flyway script name * fix replace * fix null pointer * fix superuser ignore limits * Added: tweaks and IT for getUserSelectableRoles * Added: IT cases for testGetUserSelectableRoles * Added: docs for roles/userSelectable * Added: docs for IQSS#11434 * add missing params to api doc * add checks for restricted files and unpublished dataverse and dataset * fix unit test * fix it test by piblishing dataverse and dataset * allow for limit of 0 to make dataset effectivly read only * fix unit test * fix(ct): don't push images with buildx we haven't build * fix(ct): backporting skip push tags to old releases * fix(ct): remove typos in patch file headers * add ui integration with file count limit * Update src/main/java/edu/harvard/iq/dataverse/util/json/JsonPrinter.java Co-authored-by: Guillermo Portas <[email protected]> * fix datafile not found error and added tests * add comments to new IT tests * build(deps): bump org.postgresql:postgresql in /modules/dataverse-parent Bumps [org.postgresql:postgresql](https://github.com/pgjdbc/pgjdbc) from 42.7.4 to 42.7.7. - [Release notes](https://github.com/pgjdbc/pgjdbc/releases) - [Changelog](https://github.com/pgjdbc/pgjdbc/blob/master/CHANGELOG.md) - [Commits](pgjdbc/pgjdbc@REL42.7.4...REL42.7.7) --- updated-dependencies: - dependency-name: org.postgresql:postgresql dependency-version: 42.7.7 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * add dvObject display name to json * add new tests * change flyway script from 6.6.02 to 6.6.01 * rename flyway script from 6.6.0.1 to 6.6.0.2 * Migration from forgemia.inra.fr to forge.inrae.fr * fix: add datasetType to JSON serialization of DatasetVersion THe reason to add this here is that some client tools might request specific versions of a dataset, but need the information about the type, too. For example some client might want to display a specific version of a software instead of a dataset differently. It is debatable if this should be included here, as so far there is no consensus yet wether or not the type of a dataset should be allowed to change between versions of it. On the other hand, the global ID should also never change, but it is included anyway for the sake of getting back to the dataset via the API if necessary. This should be revisited as a JSON API design issue at a later point, especially as when creating a dataset version, the type is at the root level and the version itself is at a sublevel, rendering the API inconsistent from the start. * Update apps.rst Added dataverse-metadata-crawler (https://github.com/scholarsportal/dataverse-metadata-crawler) to the API Guide/Apps for the documentation. * fix picky sphinx error * Create 11581-metadata-crawler.md * add crawler to reporting tools * use WSL not cmd.exe for Windows dev, simplify IQSS#10606 * add release note IQSS#10606 * fix links IQSS#11549 * IQSS#11542 update comments * re-work the deleteion of datafile featured items when publishing dataset * fix tests by setting publication date on datafile * fix typo in doc * changes per review * Apply suggestions from code review Co-authored-by: Copilot <[email protected]> * Changes per review * adding limit check on multi file upload * address code review feedback: add OPS, say "3.x" IQSS#11518 * add reference to LC mdb * remove unused imports, info->fine per review * improve release note IQSS#11360 * add note about the executor resource * add note about XML serialization changes * improve Search Services docs IQSS#11281 * Update doc/release-notes/TKLabels.md Co-authored-by: Omer Fahim <[email protected]> * Update doc/release-notes/TKLabels.md Co-authored-by: Philip Durbin <[email protected]> * fix after class name change * Update src/main/java/edu/harvard/iq/dataverse/search/SearchService.java Co-authored-by: Philip Durbin <[email protected]> * PIDs in addition to DOIs * add defaultService to API * static solr string * add settings to release note/docs * more name changes * test: add unit test for datasetType inclusion in DatasetVersion JSON serialization * docs: clarify dataset fields consistency across dataset versions in API guide IQSS#11573 Added a note explaining the constancy of `datasetId`, `datasetPersistentId`, and `datasetType` fields for all dataset versions in the API documentation. The first two have been around since v4.18, see also issue IQSS#6397 * partial update * fix broken page - name change, remove legacy section * exclude ext search classes * IQSS#11546 fix compare separator logic * use constant in more places * Remove the doc/sphinx-guides/HOWTO-SPHINX-INSTALL.txt file, because it is out-of-date and the updated information is part of the docs at doc/sphinx-guides/source/contributor/documentation.md. * Add a feature flag to optionally enable permission check in LC api * disable/don't load any non-solr search services in war * fix indenting IQSS#11281 * add comments to keep excludes in sync IQSS#11281 * tweak release note IQSS#11281 * finish windows instructions * fix list * minor tweaks IQSS#10606 * combine snippets (so far, with issue numbers) into 6.7 release notes IQSS#11367 * fix heading level * start reording, other tweaks * Add WSL preparation steps and package installation instructions * Update WSL installation instructions for package management and SDKMAN usage * Update WSL installation instructions to use powershell code blocks for package installation * Clarify WSL preparation steps and SDKMAN usage in documentation * minor clean-up * reordering and tweaking IQSS#11367 * more reordering and tweaking, add highlights IQSS#11367 * rework through end of API updates IQSS#11367 * edited up to upgrade steps IQSS#11367 * clean up upgrade instructions IQSS#11367 * update guides links to 6.7 IQSS#11367 * tweaks IQSS#11367 * mention MCP IQSS#11367 * add Rclone to list of integrations IQSS#11608 * make limit update a superuser only function * tweak highlights IQSS#11367 * rework video subtitles section IQSS#11367 * reword settings IQSS#11367 * curation labels updates, Solr indexing IQSS#11367 * various tweaks IQSS#11367 * add the rest of backward incompats IQSS#11367 * throw exception if non superuser changes the limit * throw exception if non superuser changes the limit * fix tests * consolidate creating DocumentBuilder * cleanup * missing import * handle XMLReader * update XMLInputStream, final closes * refactor to reduce dupe code * remove unused imports * tests * style issue * typo from earlier update * simple release note * add release note and link to announcement and PR IQSS#11608 * avoid parallel testing with cache * transformer, validator updates * avoid props/attr that don't exist in our versions * move Rclone from "getting data in" to "analysis and computation" IQSS#11608 * clarify Rclone support is read only IQSS#11608 * Update doc/release-notes/6.7-release-notes.md typo fix * Update doc/release-notes/6.7-release-notes.md typo fix * Update doc/release-notes/6.7-release-notes.md fixed typo * Update doc/release-notes/6.7-release-notes.md added a space * add windows dev to 6.7 releases notes IQSS#11367 * add configurable search services to 6.7 release notes IQSS#11367 * add Security Updates section IQSS#11367 * Updated the locally-built xoai jars with the cherry-picked commit securing b0ea5e5 XMLInputFactory there. * rebuilt the xoai jar files with a modified version, to make the project build on Jenkins without breaking all the other PR builds. * ... and the updated pom file to use the new local version * add xml parsing update IQSS#11367 * add rclone support to release notes IQSS#11367 * clarify that changes can be made later IQSS#11367 * simplify IQSS#11367 * one line per paragraph IQSS#11367 * add note about Solr dir for Docker users IQSS#11367 * Update doc/release-notes/6.7-release-notes.md typo fix * typo: fix URL Co-authored-by: Omer Fahim <[email protected]> * bump to 6.7 and update base.image.version IQSS#11371 * Update doc/release-notes/6.7-release-notes.md added word * switch base.image.version back to non-revision IQSS#11651 * catch failing pubworkflow * Fix for curation status indication. --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Philip Durbin <[email protected]> Co-authored-by: Omer Fahim <[email protected]> Co-authored-by: Steven Winship <[email protected]> Co-authored-by: Stephen Kraffmiller <[email protected]> Co-authored-by: Oliver Bertuch <[email protected]> Co-authored-by: GPortas <[email protected]> Co-authored-by: jo-pol <[email protected]> Co-authored-by: qqmyers <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Leonid Andreev <[email protected]> Co-authored-by: Steven Ferey <[email protected]> Co-authored-by: Ken Lui <[email protected]> Co-authored-by: qqmyers <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Jan van Mansum <[email protected]> Co-authored-by: Juan Pablo Tosca Villanueva <[email protected]> Co-authored-by: Florian Fritze <[email protected]>
What this PR does / why we need it: The v1 AWS SDK Dataverse uses for S3 is in maintenance mode and will reach its end of life in Dec. 2025. This PR updates Dataverse to use v2.x of the SDK.
Which issue(s) this PR closes:
Special notes for your reviewer:
From initial testing, it looks like the older localstack used in testing does not support the new API. The PR updates to localstack/localstack:4.2.0 in the docker-compose-dev file.
Sword uses old versions of apache abdera. I've added exclusions to avoid failures with S3. Updating Sword might be a useful thing to do at some point.
Suggestions on how to test this:
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?: inc.
Additional documentation: