Replies: 1 comment 1 reply
-
An idea that was mentioned a while back related to this was using a dunder function. So we'd check for something like You can then create a wrapper class around your object that contains the parameters+object you want to return The other idea sounds too far out imo, that would be better reserved for a table udf |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
TL;DR
Allow registering a Python callback (e.g.
register_replacement_scan(my_callback)
) that can dynamically resolve table names into actual data sources (RecordBatchReader, DataFrame, etc.), similar to replacement scans in the C++ API - without modifying local/global scope.Description
I would like to have a function like
register_replacement_scan(my_callback)
available in the Python API.
The idea is that
my_callback
receives a table name and returns whatever type is accepted byregister()
(e.g. a DataFrame, RecordBatchReader, etc.) orNone
if it doesn’t apply.Motivation
I want to pass arguments to my PyArrow RecordBatch/Dataset dynamically.
This feature would serve as a lightweight alternative to table-valued functions - easier to implement, since the current workaround already exists but relies on manipulating the local/global scope.
Current workaround
For example, running this query:
requires the following workaround in Python:
Proposed solution
The same query:
could be handled much more cleanly with:
This would allow transparent handling of custom sources or file formats without pre-registering them manually.
Extended proposal
An additional improvement would be to introduce a "reader registry", allowing registration of callable read functions that accept parameters.
This extends the idea of
register_replacement_scan()
by enabling parameterized SQL functions that internally map to Python callbacks.Example
Then the corresponding SQL query could look like:
Sumary
Adding
register_replacement_scan()
and optionally extending it with a reader registry - would make it much easier to:Beta Was this translation helpful? Give feedback.
All reactions