|
| 1 | +# Migrating BigQuery Storage API from v1beta1 to v1: Python |
| 2 | + |
| 3 | +This guide shows how to migrate Python code using the BigQuery Storage API from |
| 4 | +version `v1beta1` to `v1`. |
| 5 | + |
| 6 | +## Key Changes |
| 7 | + |
| 8 | +* **Package Imports**: `google.cloud.bigquery_storage_v1beta1` -> |
| 9 | + `google.cloud.bigquery_storage_v1` |
| 10 | +* **Service Client**: `BigQueryStorageClient` is replaced by |
| 11 | + `BigQueryReadClient`. |
| 12 | +* **Table Reference**: `TableReference` type is replaced by a simple string |
| 13 | + representation of the table path in `ReadSession.table`. |
| 14 | +* **Session Configuration**: Configuration fields (table, format, read |
| 15 | + options) have moved into `ReadSession` type, which is passed in |
| 16 | + `CreateReadSessionRequest`. |
| 17 | +* **Parallelism**: `requested_streams` parameter in `create_read_session` is |
| 18 | + replaced by `max_stream_count`. |
| 19 | +* **Sharding Strategy**: `sharding_strategy` is removed. The server now |
| 20 | + automatically balances the streams. |
| 21 | +* **Read Rows Request**: `StreamPosition` is flattened. You now pass the |
| 22 | + stream name directly to `read_rows` as `name` (maps to `read_stream` in |
| 23 | + proto) and the `offset`. |
| 24 | + |
| 25 | +## Code Comparison |
| 26 | + |
| 27 | +### 1. Client Initialization |
| 28 | + |
| 29 | +**v1beta1:** |
| 30 | + |
| 31 | +```python |
| 32 | +from google.cloud import bigquery_storage_v1beta1 |
| 33 | + |
| 34 | +client = bigquery_storage_v1beta1.BigQueryStorageClient() |
| 35 | +``` |
| 36 | + |
| 37 | +**v1:** |
| 38 | + |
| 39 | +```python |
| 40 | +from google.cloud import bigquery_storage_v1 |
| 41 | + |
| 42 | +client = bigquery_storage_v1.BigQueryReadClient() |
| 43 | +``` |
| 44 | + |
| 45 | +### 2. Creating a Read Session |
| 46 | + |
| 47 | +**v1beta1:** |
| 48 | + |
| 49 | +```python |
| 50 | +from google.cloud import bigquery_storage_v1beta1 |
| 51 | + |
| 52 | +table_ref = bigquery_storage_v1beta1.types.TableReference() |
| 53 | +table_ref.project_id = "bigquery-public-data" |
| 54 | +table_ref.dataset_id = "usa_names" |
| 55 | +table_ref.table_id = "usa_1910_current" |
| 56 | + |
| 57 | +read_options = bigquery_storage_v1beta1.types.TableReadOptions() |
| 58 | +read_options.selected_fields.append("name") |
| 59 | +read_options.row_restriction = 'state = "WA"' |
| 60 | + |
| 61 | +session = client.create_read_session( |
| 62 | + table_reference=table_ref, |
| 63 | + parent='projects/read-session-project', |
| 64 | + read_options=read_options, |
| 65 | + requested_streams=1, |
| 66 | + format=bigquery_storage_v1beta1.types.DataFormat.AVRO, |
| 67 | + sharding_strategy=bigquery_storage_v1beta1.types.ShardingStrategy.LIQUID |
| 68 | +) |
| 69 | +``` |
| 70 | + |
| 71 | +**v1:** |
| 72 | + |
| 73 | +```python |
| 74 | +from google.cloud import bigquery_storage_v1 |
| 75 | + |
| 76 | +# Table path is now a string: projects/{project}/datasets/{dataset}/tables/{table} |
| 77 | +table_path = "projects/bigquery-public-data/datasets/usa_names/tables/usa_1910_current" |
| 78 | + |
| 79 | +read_options = bigquery_storage_v1.types.ReadSession.TableReadOptions() |
| 80 | +read_options.selected_fields.append("name") |
| 81 | +read_options.row_restriction = 'state = "WA"' |
| 82 | + |
| 83 | +# ReadSession holds the session configuration |
| 84 | +read_session = bigquery_storage_v1.types.ReadSession( |
| 85 | + table=table_path, |
| 86 | + data_format=bigquery_storage_v1.types.DataFormat.AVRO, # format renamed to data_format |
| 87 | + read_options=read_options |
| 88 | +) |
| 89 | + |
| 90 | +session = client.create_read_session( |
| 91 | + parent="projects/read-session-project", |
| 92 | + read_session=read_session, |
| 93 | + max_stream_count=1 # requested_streams renamed to max_stream_count |
| 94 | +) |
| 95 | +``` |
| 96 | + |
| 97 | +### 3. Reading Rows |
| 98 | + |
| 99 | +**v1beta1:** |
| 100 | + |
| 101 | +```python |
| 102 | +from google.cloud import bigquery_storage_v1beta1 |
| 103 | + |
| 104 | +position = bigquery_storage_v1beta1.types.StreamPosition( |
| 105 | + stream=session.streams[0], |
| 106 | + offset=0 |
| 107 | +) |
| 108 | + |
| 109 | +reader = client.read_rows(read_position=position) |
| 110 | + |
| 111 | +for row in reader.rows(session): |
| 112 | + print(row["name"]) |
| 113 | +``` |
| 114 | + |
| 115 | +**v1:** |
| 116 | + |
| 117 | +```python |
| 118 | +# read_rows accepts stream name and offset directly. |
| 119 | +# Note that the parameter name in the library helper is 'name', which maps to 'read_stream' in proto. |
| 120 | +reader = client.read_rows( |
| 121 | + name=session.streams[0].name, |
| 122 | + offset=0 |
| 123 | +) |
| 124 | + |
| 125 | +# In v1, you don't need to pass session to rows() if schema info is already present, |
| 126 | +# but the client library handles it. |
| 127 | +for row in reader.rows(): |
| 128 | + print(row["name"]) |
| 129 | +``` |
0 commit comments