[BATCHCMP-379] Support subdirectory reads by anuvedverma · Pull Request #62 · lyft/spark

anuvedverma · 2025-12-05T07:50:24Z

(cherry picked from commit 984bf78)

Brings Spark 3.3 patch #40 into our Spark 3.5 fork, with a couple small modifications:

to get build working, we set spark.sql.sources.readPartitionWithSubdirectory.enabled to false by default (then set to true in spark wrapper https://github.com/lyft/spark-private/pull/1575)
OSS Spark had removed some listLeafDirStatuses and listLeafDirStatuses from SparkHadoopUtil.scala -- this PR brings them back for our fork

This patch is needed because there are a large number of tables with nested subdirectories (eg. most tables under feature schema, generated by featureservice) -- see thread for more context.

Eventually, we want to fix underlying tables that rely on nested subdirectories, and get rid of this patch & config entirely. But fixing these tables will take a longer dedicated effort. Created backlog ticket BATCHCMP-459 for this effort in the future.

semgrep-code-lyft-go-lcs-scandocs · 2025-12-05T07:57:29Z

Legal Risk

The following dependencies were released under a license that
has been flagged by your organization for consideration.

Recommendation

While merging is not directly blocked, it's best to pause and consider what it means to use this license before continuing. If you are unsure, reach out to your security team or Semgrep admin to address this issue.

EPL-1.0

EPL-2.0

GPL-2.0

LGPL-2.1

(cherry picked from commit 984bf78)

…l.scala

semgrep-code-lyft-go-lcs-scandocs · 2025-12-05T08:15:32Z

Legal Risk

The following dependencies were released under a license that
has been flagged by your organization for consideration.

Recommendation

While merging is not directly blocked, it's best to pause and consider what it means to use this license before continuing. If you are unsure, reach out to your security team or Semgrep admin to address this issue.

EPL-1.0

EPL-2.0

org.glassfish.jersey.media:jersey-media-jaxb 2.40

…alse

github-actions bot added the SQL label Dec 5, 2025

[SPARK-28098][SQL]Support read partitioned Hive tables with (#40)

6e63856

(cherry picked from commit 984bf78)

anuvedverma force-pushed the BATCHCMP-379-support-subdirectory-reads branch from c60b6f9 to dcd8c6d Compare December 5, 2025 08:09

github-actions bot added the CORE label Dec 5, 2025

add listLeafDirStatuses and listLeafDirStatusesback to SparkHadoopUti…

4d917dc

…l.scala

anuvedverma force-pushed the BATCHCMP-379-support-subdirectory-reads branch from dcd8c6d to 4d917dc Compare December 5, 2025 08:13

set default spark.sql.sources.readPartitionWithSubdirectory.enabled=f…

52d4c1c

…alse

anuvedverma marked this pull request as ready for review December 6, 2025 00:13

anuvedverma merged commit 51989e6 into v3.5.6-lyft Dec 6, 2025
79 of 86 checks passed

anuvedverma deleted the BATCHCMP-379-support-subdirectory-reads branch December 6, 2025 00:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BATCHCMP-379] Support subdirectory reads#62

[BATCHCMP-379] Support subdirectory reads#62
anuvedverma merged 3 commits intov3.5.6-lyftfrom
BATCHCMP-379-support-subdirectory-reads

anuvedverma commented Dec 5, 2025 •

edited

Loading

Uh oh!

semgrep-code-lyft-go-lcs-scandocs bot commented Dec 5, 2025

Uh oh!

semgrep-code-lyft-go-lcs-scandocs bot commented Dec 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

anuvedverma commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

semgrep-code-lyft-go-lcs-scandocs bot commented Dec 5, 2025

Legal Risk

Recommendation

Uh oh!

semgrep-code-lyft-go-lcs-scandocs bot commented Dec 5, 2025

Legal Risk

Recommendation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

anuvedverma commented Dec 5, 2025 •

edited

Loading