Skip to content

Conversation

@TebaleloS
Copy link
Collaborator

@TebaleloS TebaleloS commented Mar 23, 2023

Notes

  • Initialized DEFAULT_PARQUET_DATETIME_READ_MODE and DEFAULT_PARQUET_DATETIME_WRITE_MODE with a default value CORRECTED.
  • Added option condition for assessing the value of datetime read/write mode

Closes #2175

@TebaleloS TebaleloS marked this pull request as ready for review March 23, 2023 07:41
Copy link
Collaborator

@miroslavpojer miroslavpojer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • pulled
  • code review
  • command generation test

Used command-line calls:

  • run_conformance.sh --rest-api-credentials-file ~/.ssh/menasCredential.properties --dataset-name test --dataset-version 1 --report-date 2020-03-03 --dry-run
  • run_standardization.sh --rest-api-credentials-file ~/.ssh/menasCredential.properties --dataset-name test --dataset-version 1 --report-date 2020-03-03 --dry-run
  • run_standardization_conformance.sh --rest-api-credentials-file ~/.ssh/menasCredential.properties --dataset-name test --dataset-version 1 --report-date 2020-03-03 --dry-run

Only bash files file were tested.
There is required to test last cmd one.

Copy link
Contributor

@dk1844 dk1844 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

  • code reviewed
  • pulled
  • prepared dummy Enceladus env to see the spark-submit command
  • run

@dk1844
Copy link
Contributor

dk1844 commented Mar 24, 2023

I have tested both the sh and cmd script to see that they correctly pass the datetime configuration fields down to the spark-submit. I have not run the actual job.

sh

./run_standardization.sh ... --parquet-datetime-read-mode XX --parquet-datetime-write-mode YY -> spark-submit ... --conf spark.sql.parquet.datetimeRebaseModeInRead=XX --conf spark.sql.parquet.datetimeRebaseModeInWrite=YY --conf spark.sql.parquet.int96RebaseModeInRead=XX --conf spark.sql.parquet.int96RebaseModeInWrite=YY ... ✔️

./run_conformance.sh ... --parquet-datetime-read-mode XX --parquet-datetime-write-mode YY -> spark-submit ... --conf spark.sql.parquet.datetimeRebaseModeInRead=XX --conf spark.sql.parquet.datetimeRebaseModeInWrite=YY --conf spark.sql.parquet.int96RebaseModeInRead=XX --conf spark.sql.parquet.int96RebaseModeInWrite=YY ... ✔️

./run_standardization_conformance.sh ... --parquet-datetime-read-mode XX --parquet-datetime-write-mode YY -> spark-submit ... --conf spark.sql.parquet.datetimeRebaseModeInRead=XX --conf spark.sql.parquet.datetimeRebaseModeInWrite=YY --conf spark.sql.parquet.int96RebaseModeInRead=XX --conf spark.sql.parquet.int96RebaseModeInWrite=YY ... ✔️

cmd

.\run_standardization.cmd ... --parquet-datetime-read-mode XX --parquet-datetime-write-mode YY ->
spark-submit ... --conf spark.sql.parquet.datetimeRebaseModeInRead=XX --conf spark.sql.parquet.datetimeRebaseModeInWrite=YY --conf spark.sql.parquet.int96RebaseModeInRead=XX --conf spark.sql.parquet.int96RebaseModeInWrite=YY ... ✔️

.\run_conformance.cmd ... --parquet-datetime-read-mode XX --parquet-datetime-write-mode YY ->
spark-submit --conf spark.sql.parquet.datetimeRebaseModeInRead=XX --conf spark.sql.parquet.datetimeRebaseModeInWrite=YY --conf spark.sql.parquet.int96RebaseModeInRead=XX --conf spark.sql.parquet.int96RebaseModeInWrite=YY ✔️

.\run_standardization_conformance.cmd ... --parquet-datetime-read-mode XX --parquet-datetime-write-mode YY -> spark-submit" ... --conf spark.sql.parquet.datetimeRebaseModeInRead=XX --conf spark.sql.parquet.datetimeRebaseModeInWrite=YY --conf spark.sql.parquet.int96RebaseModeInRead=XX --conf spark.sql.parquet.int96RebaseModeInWrite=YY ... ✔️

@dk1844 dk1844 added the PR:tested Only for PR - PR was tested by a tester (person) label Mar 24, 2023
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

Copy link
Collaborator

@lsulak lsulak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just read the code, given that two reviewers tested it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR:tested Only for PR - PR was tested by a tester (person)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ability to configure how Spark handles dates in parquet files.

5 participants