Skip to content

Conversation

@cottrell
Copy link

@cottrell cottrell commented Oct 21, 2025

Rationale for this change

Users who want to coerce every CSV column to a single type currently have to pre-compute a schema or enumerate column names. Adding a default type on ConvertOptions removes that friction (e.g. “read everything as string”).

What changes are included in this PR?

  • add ConvertOptions::column_type in C++ and honor it in the CSV reader when no per-column mapping exists
  • expose the knob as csv.ConvertOptions(column_type=…) in PyArrow, with documentation updates
  • extend the Python CSV tests to cover string, integer, and float defaults, including include_missing_columns

Are these changes tested?

  • make test-csv
  • pytest python/pyarrow/tests/test_cpp_internals.py
  • a full pytest python/pyarrow attempt times out in the dataset backpressure test on this environment (documented limitation)

Are there any user-facing changes?

  • new public column_type parameter on pyarrow.csv.ConvertOptions
  • documentation additions showing how to set a single default type

Component(s)

C++, Python

@github-actions
Copy link

⚠️ GitHub issue #47897 has been automatically assigned in GitHub to PR creator.

@github-actions
Copy link

⚠️ GitHub issue #47897 has no components, please add labels for components.

@github-actions
Copy link

⚠️ GitHub issue #47897 has no components, please add labels for components.

@AlenkaF
Copy link
Member

AlenkaF commented Oct 24, 2025

Thank you for the contribution @cottrell.
There is a very similar, probably also AI generated (?) PR up: #47663. It looks more complete so I propose pushing that one forward in case other maintainers agree on the approach.

@cottrell
Copy link
Author

Thank you for the contribution @cottrell. There is a very similar, probably also AI generated (?) PR up: #47663. It looks more complete so I propose pushing that one forward in case other maintainers agree on the approach.

Yes! I did search for this but didn't find it. Definitely go with that one if it's already in progress or better. It is partly agentic yes. Interesting that someone else has also hitting this very minor but useful addition. I will close this for now.

@cottrell cottrell closed this Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants