GH-47897: [C++][Python] Allow default column type for CSV columns #47898

cottrell · 2025-10-21T16:21:25Z

Rationale for this change

Users who want to coerce every CSV column to a single type currently have to pre-compute a schema or enumerate column names. Adding a default type on ConvertOptions removes that friction (e.g. “read everything as string”).

What changes are included in this PR?

add ConvertOptions::column_type in C++ and honor it in the CSV reader when no per-column mapping exists
expose the knob as csv.ConvertOptions(column_type=…) in PyArrow, with documentation updates
extend the Python CSV tests to cover string, integer, and float defaults, including include_missing_columns

Are these changes tested?

make test-csv
pytest python/pyarrow/tests/test_cpp_internals.py
a full pytest python/pyarrow attempt times out in the dataset backpressure test on this environment (documented limitation)

Are there any user-facing changes?

new public column_type parameter on pyarrow.csv.ConvertOptions
documentation additions showing how to set a single default type

Component(s)

C++, Python

GitHub Issue: [C++][Python] Allow specifying default column type for CSV columns #47897

github-actions · 2025-10-21T16:22:02Z

⚠️ GitHub issue #47897 has been automatically assigned in GitHub to PR creator.

github-actions · 2025-10-21T16:22:03Z

⚠️ GitHub issue #47897 has no components, please add labels for components.

github-actions · 2025-10-21T16:52:42Z

⚠️ GitHub issue #47897 has no components, please add labels for components.

AlenkaF · 2025-10-24T06:26:09Z

Thank you for the contribution @cottrell.
There is a very similar, probably also AI generated (?) PR up: #47663. It looks more complete so I propose pushing that one forward in case other maintainers agree on the approach.

cottrell · 2025-10-24T09:13:06Z

Thank you for the contribution @cottrell. There is a very similar, probably also AI generated (?) PR up: #47663. It looks more complete so I propose pushing that one forward in case other maintainers agree on the approach.

Yes! I did search for this but didn't find it. Definitely go with that one if it's already in progress or better. It is partly agentic yes. Interesting that someone else has also hitting this very minor but useful addition. I will close this for now.

apacheGH-47897: [C++][Python] Allow default column type for CSV columns

c9d85ae

cottrell requested review from AlenkaF, raulcd and rok as code owners October 21, 2025 16:21

github-actions bot added Component: C++ Component: Python Component: Documentation awaiting review Awaiting review labels Oct 21, 2025

Merge branch 'main' into csv

d37685b

AlenkaF mentioned this pull request Oct 24, 2025

[C++] CSV reader: add a default column type (or sentinel mapping) to avoid per-column enumeration #47502

Closed

cottrell closed this Oct 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GH-47897: [C++][Python] Allow default column type for CSV columns #47898

GH-47897: [C++][Python] Allow default column type for CSV columns #47898

cottrell commented Oct 21, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

AlenkaF commented Oct 24, 2025

Uh oh!

cottrell commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GH-47897: [C++][Python] Allow default column type for CSV columns #47898

GH-47897: [C++][Python] Allow default column type for CSV columns #47898

Conversation

cottrell commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Component(s)

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

AlenkaF commented Oct 24, 2025

Uh oh!

cottrell commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cottrell commented Oct 21, 2025 •

edited

Loading