Skip to content

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Oct 7, 2025

Also, fixes several constructors that didn't take a session for compatibility with multi-session applications.

🦕

…ples

Also, fixes several constructors that didn't take a session for
compatibility with multi-session applications.
@tswast tswast requested review from a team as code owners October 7, 2025 21:56
@tswast tswast requested a review from TrevorBergeron October 7, 2025 21:56
@product-auto-label product-auto-label bot added the size: l Pull request size is large. label Oct 7, 2025
@product-auto-label product-auto-label bot added api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. samples Issues that are directly related to samples. labels Oct 7, 2025
doctest_namespace["np"] = np
doctest_namespace["pd"] = pd
doctest_namespace["pa"] = pa
doctest_namespace["bpd"] = polars_session
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if instead, we should just inject the polars session as global session? Not sure all the methods are the same, but I guess it works so far?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, there's quite a bit that isn't supported yet on the Polars session. Doing it this way means that we can override bpd to be the BQ version in the samples itself with a simple import.

Comment on lines 2290 to 2291
# These are included so that Session and bigframes.pandas can be used
# interchangeably.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is purely for doctests, or can we just inject session for doctests? or are we trying to enable some other stuff

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it's important for consistency. I actually uncovered a few cases where the session should have been supplied to things like to_datetime() with local data but wasn't.

Comment on lines 221 to 223
) -> Union[pandas.Timestamp, datetime.datetime, bigframes.series.Series]:
return global_session.with_default_session(
bigframes.session.Session.to_datetime,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this change/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_datetime() has code paths that take local data. It was using the global session implicitly when it constructed the Series objects. Now it can take a session explicitly.

**Examples:**
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we still need some of the bpd imports?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how I made some samples use the BQ session instead of the Polars session. In this case, hash() is unimplemented:

third_party/bigframes_vendored/pandas/core/generic.py ..........F.                                                                 [100%]

================================================================ FAILURES ================================================================
______________________________ [doctest] third_party.bigframes_vendored.pandas.core.generic.NDFrame.sample _______________________________
558             dog            4          0                  2
559             spider         8          0                  1
560             fish           0          0                  8
561             <BLANKLINE>
562             [4 rows x 3 columns]
563 
564         Fetch one random row from the DataFrame (Note that we use `random_state`
565         to ensure reproducibility of the examples):
566 
567             >>> df.sample(random_state=1)
UNEXPECTED EXCEPTION: NotImplementedError("Polars compiler hasn't implemented hash()")
Traceback (most recent call last):
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/doctest.py", line 1350, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest third_party.bigframes_vendored.pandas.core.generic.NDFrame.sample[2]>", line 1, in <module>
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/log_adapter.py", line 197, in wrapper
    raise e
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/log_adapter.py", line 182, in wrapper
    return method(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/dataframe.py", line 794, in __repr__
    pandas_df, row_count, query_job = self._block.retrieve_repr_request_results(
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/blocks.py", line 1658, in retrieve_repr_request_results
    head_result = self.session._executor.execute(
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/testing/polars_session.py", line 48, in execute
    lazy_frame: polars.LazyFrame = self.compiler.compile(array_value.node)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 577, in compile
    return self.compile_node(node)
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 630, in compile_selection
    return self.compile_node(node.child).select(
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 601, in compile_filter
    return self.compile_node(node.child).filter(
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 649, in compile_offsets
    return self.compile_node(node.child).with_columns(
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 630, in compile_selection
    return self.compile_node(node.child).select(
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 601, in compile_filter
    return self.compile_node(node.child).filter(
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 649, in compile_offsets
    return self.compile_node(node.child).with_columns(
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 630, in compile_selection
    return self.compile_node(node.child).select(
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 607, in compile_orderby
    frame = self.compile_node(node.child)
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 630, in compile_selection
    return self.compile_node(node.child).select(
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 639, in compile_projection
    new_col = self.expr_compiler.compile_expression(bound_expr).alias(name.sql)
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 180, in _
    return self.compile_op(op, *args)
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 184, in compile_op
    raise NotImplementedError(f"Polars compiler hasn't implemented {op}")
NotImplementedError: Polars compiler hasn't implemented hash()
/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/third_party/bigframes_vendored/pandas/core/generic.py:567: UnexpectedException

Aside: As much as possible I'd like to encourage us BigFrames devs to implement our ops in the Polars session as well as BQ, so defaulting to Polars is a subtle nudge in that direction.

Comment on lines 221 to 223
) -> Union[pandas.Timestamp, datetime.datetime, bigframes.series.Series]:
return global_session.with_default_session(
bigframes.session.Session.to_datetime,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_datetime() has code paths that take local data. It was using the global session implicitly when it constructed the Series objects. Now it can take a session explicitly.

Comment on lines 2331 to 2333
MultiIndex.from_tuples = bigframes.core.indexes.MultiIndex.from_tuples # type: ignore
MultiIndex.from_frame = bigframes.core.indexes.MultiIndex.from_frame # type: ignore
MultiIndex.from_arrays = bigframes.core.indexes.MultiIndex.from_arrays # type: ignore
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: these should probably take a Session argument, too.

doctest_namespace["np"] = np
doctest_namespace["pd"] = pd
doctest_namespace["pa"] = pa
doctest_namespace["bpd"] = polars_session
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, there's quite a bit that isn't supported yet on the Polars session. Doing it this way means that we can override bpd to be the BQ version in the samples itself with a simple import.

**Examples:**
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how I made some samples use the BQ session instead of the Polars session. In this case, hash() is unimplemented:

third_party/bigframes_vendored/pandas/core/generic.py ..........F.                                                                 [100%]

================================================================ FAILURES ================================================================
______________________________ [doctest] third_party.bigframes_vendored.pandas.core.generic.NDFrame.sample _______________________________
558             dog            4          0                  2
559             spider         8          0                  1
560             fish           0          0                  8
561             <BLANKLINE>
562             [4 rows x 3 columns]
563 
564         Fetch one random row from the DataFrame (Note that we use `random_state`
565         to ensure reproducibility of the examples):
566 
567             >>> df.sample(random_state=1)
UNEXPECTED EXCEPTION: NotImplementedError("Polars compiler hasn't implemented hash()")
Traceback (most recent call last):
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/doctest.py", line 1350, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest third_party.bigframes_vendored.pandas.core.generic.NDFrame.sample[2]>", line 1, in <module>
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/log_adapter.py", line 197, in wrapper
    raise e
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/log_adapter.py", line 182, in wrapper
    return method(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/dataframe.py", line 794, in __repr__
    pandas_df, row_count, query_job = self._block.retrieve_repr_request_results(
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/blocks.py", line 1658, in retrieve_repr_request_results
    head_result = self.session._executor.execute(
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/testing/polars_session.py", line 48, in execute
    lazy_frame: polars.LazyFrame = self.compiler.compile(array_value.node)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 577, in compile
    return self.compile_node(node)
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 630, in compile_selection
    return self.compile_node(node.child).select(
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 601, in compile_filter
    return self.compile_node(node.child).filter(
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 649, in compile_offsets
    return self.compile_node(node.child).with_columns(
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 630, in compile_selection
    return self.compile_node(node.child).select(
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 601, in compile_filter
    return self.compile_node(node.child).filter(
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 649, in compile_offsets
    return self.compile_node(node.child).with_columns(
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 630, in compile_selection
    return self.compile_node(node.child).select(
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 607, in compile_orderby
    frame = self.compile_node(node.child)
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 630, in compile_selection
    return self.compile_node(node.child).select(
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 639, in compile_projection
    new_col = self.expr_compiler.compile_expression(bound_expr).alias(name.sql)
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 180, in _
    return self.compile_op(op, *args)
  File "/usr/local/google/home/swast/.pyenv/versions/3.10.16/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/bigframes/core/compile/polars/compiler.py", line 184, in compile_op
    raise NotImplementedError(f"Polars compiler hasn't implemented {op}")
NotImplementedError: Polars compiler hasn't implemented hash()
/usr/local/google/home/swast/src/github.com/googleapis/python-bigquery-dataframes/third_party/bigframes_vendored/pandas/core/generic.py:567: UnexpectedException

Aside: As much as possible I'd like to encourage us BigFrames devs to implement our ops in the Polars session as well as BQ, so defaulting to Polars is a subtle nudge in that direction.

dummy.pkl Outdated
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this

@tswast tswast force-pushed the tswast-doctest-boilerplate branch from c3a23ea to 8b069e8 Compare October 9, 2025 17:18
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: l Pull request size is large. labels Oct 9, 2025
@product-auto-label product-auto-label bot added size: s Pull request size is small. and removed size: m Pull request size is medium. labels Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. samples Issues that are directly related to samples. size: s Pull request size is small.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants