Skip to content

Commit 3ddf7da

Browse files
committed
feat: BigQuery Storage v1beta1 API migration guide
1 parent 22ecaf0 commit 3ddf7da

1 file changed

Lines changed: 129 additions & 0 deletions

File tree

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
# Migrating BigQuery Storage API from v1beta1 to v1: Python
2+
3+
This guide shows how to migrate Python code using the BigQuery Storage API from
4+
version `v1beta1` to `v1`.
5+
6+
## Key Changes
7+
8+
* **Package Imports**: `google.cloud.bigquery_storage_v1beta1` ->
9+
`google.cloud.bigquery_storage_v1`
10+
* **Service Client**: `BigQueryStorageClient` is replaced by
11+
`BigQueryReadClient`.
12+
* **Table Reference**: `TableReference` type is replaced by a simple string
13+
representation of the table path in `ReadSession.table`.
14+
* **Session Configuration**: Configuration fields (table, format, read
15+
options) have moved into `ReadSession` type, which is passed in
16+
`CreateReadSessionRequest`.
17+
* **Parallelism**: `requested_streams` parameter in `create_read_session` is
18+
replaced by `max_stream_count`.
19+
* **Sharding Strategy**: `sharding_strategy` is removed. The server now
20+
automatically balances the streams.
21+
* **Read Rows Request**: `StreamPosition` is flattened. You now pass the
22+
stream name directly to `read_rows` as `name` (maps to `read_stream` in
23+
proto) and the `offset`.
24+
25+
## Code Comparison
26+
27+
### 1. Client Initialization
28+
29+
**v1beta1:**
30+
31+
```python
32+
from google.cloud import bigquery_storage_v1beta1
33+
34+
client = bigquery_storage_v1beta1.BigQueryStorageClient()
35+
```
36+
37+
**v1:**
38+
39+
```python
40+
from google.cloud import bigquery_storage_v1
41+
42+
client = bigquery_storage_v1.BigQueryReadClient()
43+
```
44+
45+
### 2. Creating a Read Session
46+
47+
**v1beta1:**
48+
49+
```python
50+
from google.cloud import bigquery_storage_v1beta1
51+
52+
table_ref = bigquery_storage_v1beta1.types.TableReference()
53+
table_ref.project_id = "bigquery-public-data"
54+
table_ref.dataset_id = "usa_names"
55+
table_ref.table_id = "usa_1910_current"
56+
57+
read_options = bigquery_storage_v1beta1.types.TableReadOptions()
58+
read_options.selected_fields.append("name")
59+
read_options.row_restriction = 'state = "WA"'
60+
61+
session = client.create_read_session(
62+
table_reference=table_ref,
63+
parent='projects/read-session-project',
64+
read_options=read_options,
65+
requested_streams=1,
66+
format=bigquery_storage_v1beta1.types.DataFormat.AVRO,
67+
sharding_strategy=bigquery_storage_v1beta1.types.ShardingStrategy.LIQUID
68+
)
69+
```
70+
71+
**v1:**
72+
73+
```python
74+
from google.cloud import bigquery_storage_v1
75+
76+
# Table path is now a string: projects/{project}/datasets/{dataset}/tables/{table}
77+
table_path = "projects/bigquery-public-data/datasets/usa_names/tables/usa_1910_current"
78+
79+
read_options = bigquery_storage_v1.types.ReadSession.TableReadOptions()
80+
read_options.selected_fields.append("name")
81+
read_options.row_restriction = 'state = "WA"'
82+
83+
# ReadSession holds the session configuration
84+
read_session = bigquery_storage_v1.types.ReadSession(
85+
table=table_path,
86+
data_format=bigquery_storage_v1.types.DataFormat.AVRO, # format renamed to data_format
87+
read_options=read_options
88+
)
89+
90+
session = client.create_read_session(
91+
parent="projects/read-session-project",
92+
read_session=read_session,
93+
max_stream_count=1 # requested_streams renamed to max_stream_count
94+
)
95+
```
96+
97+
### 3. Reading Rows
98+
99+
**v1beta1:**
100+
101+
```python
102+
from google.cloud import bigquery_storage_v1beta1
103+
104+
position = bigquery_storage_v1beta1.types.StreamPosition(
105+
stream=session.streams[0],
106+
offset=0
107+
)
108+
109+
reader = client.read_rows(read_position=position)
110+
111+
for row in reader.rows(session):
112+
print(row["name"])
113+
```
114+
115+
**v1:**
116+
117+
```python
118+
# read_rows accepts stream name and offset directly.
119+
# Note that the parameter name in the library helper is 'name', which maps to 'read_stream' in proto.
120+
reader = client.read_rows(
121+
name=session.streams[0].name,
122+
offset=0
123+
)
124+
125+
# In v1, you don't need to pass session to rows() if schema info is already present,
126+
# but the client library handles it.
127+
for row in reader.rows():
128+
print(row["name"])
129+
```

0 commit comments

Comments
 (0)