Skip to content
Open
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
882c325
Updated output schema tutorial notebook
ccarpentiere Aug 15, 2025
4289019
Additional small spelling fixes for bq_dataframes_llm_output_schema.i…
ccarpentiere Aug 18, 2025
7986c05
Merge branch 'main' into output_schema
ccarpentiere Aug 18, 2025
5ec67a6
Merge branch 'main' into output_schema
ccarpentiere Aug 18, 2025
6b38388
Removed Vertex AI import link
ccarpentiere Aug 19, 2025
bc6a687
Merge branch 'main' into output_schema
ccarpentiere Aug 19, 2025
d56df99
Merge branch 'googleapis:main' into output_schema
ccarpentiere Aug 19, 2025
f8de2ea
Merge branch 'output_schema' of https://github.com/ccarpentiere/pytho…
ccarpentiere Aug 19, 2025
8a376f5
Merge branch 'googleapis:main' into output_schema
ccarpentiere Aug 20, 2025
485a852
Remove placeholder project name that is breaking tests
ccarpentiere Aug 20, 2025
90f8c10
Merge branch 'output_schema' of https://github.com/ccarpentiere/pytho…
ccarpentiere Aug 20, 2025
2f6df1d
Merge branch 'main' into output_schema
ccarpentiere Aug 27, 2025
4776872
Merge branch 'main' into output_schema
ccarpentiere Aug 27, 2025
c34409d
Merge branch 'main' into output_schema
ccarpentiere Aug 28, 2025
7af8a91
Merge branch 'main' into output_schema
ccarpentiere Sep 2, 2025
c0f71ef
Merge branch 'main' into output_schema
ccarpentiere Sep 3, 2025
25a1be0
Merge branch 'main' into output_schema
ccarpentiere Sep 8, 2025
1b82370
Merge branch 'main' into output_schema
ccarpentiere Sep 9, 2025
ac67291
Merge branch 'main' into output_schema
ccarpentiere Sep 9, 2025
e9700a2
Merge branch 'main' into output_schema
ccarpentiere Sep 9, 2025
378ebe4
Merge branch 'main' into output_schema
ccarpentiere Sep 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 155 additions & 20 deletions notebooks/generative_ai/bq_dataframes_llm_output_schema.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# BigFrames LLM Output Schema\n",
"# Format LLM output using an output schema\n",
"\n",
"<table align=\"left\">\n",
"\n",
Expand All @@ -43,7 +43,7 @@
" <td>\n",
" <a href=\"https://console.cloud.google.com/bigquery/import?url=https://github.com/googleapis/python-bigquery-dataframes/blob/main/notebooks/generative_ai/bq_dataframes_llm_output_schema.ipynb\">\n",
" <img src=\"https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTW1gvOovVlbZAIZylUtf5Iu8-693qS1w5NJw&s\" alt=\"BQ logo\" width=\"35\">\n",
" Open in BQ Studio\n",
" Open in BigQuery Studio\n",
" </a>\n",
" </td>\n",
"</table>\n"
Expand All @@ -53,26 +53,124 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This Notebook introduces BigFrames LLM with output schema to generate structured output dataframes."
"This notebook shows you how to create structured LLM output by specifying an output schema when generating predictions with a Gemini model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setup"
"## Costs\n",
"\n",
"This tutorial uses billable components of Google Cloud:\n",
"\n",
"* BigQuery (compute)\n",
"* BigQuery ML\n",
"* Generative AI support on Vertex AI\n",
"\n",
"Learn about [BigQuery compute pricing](https://cloud.google.com/bigquery/pricing#analysis_pricing_models), [Generative AI support on Vertex AI pricing](https://cloud.google.com/vertex-ai/generative-ai/pricing),\n",
"and [BigQuery ML pricing](https://cloud.google.com/bigquery/pricing#section-11),\n",
"and use the [Pricing Calculator](https://cloud.google.com/products/calculator/)\n",
"to generate a cost estimate based on your projected usage."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before you begin\n",
"\n",
"Complete the tasks in this section to set up your environment."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set up your Google Cloud project\n",
"\n",
"**The following steps are required, regardless of your notebook environment.**\n",
"\n",
"1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 credit towards your compute/storage costs.\n",
"\n",
"2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n",
"\n",
"3. [Click here](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,bigqueryconnection.googleapis.com,aiplatform.googleapis.com) to enable the following APIs:\n",
"\n",
" * BigQuery API\n",
" * BigQuery Connection API\n",
" * Vertex AI API\n",
"\n",
"4. If you are running this notebook locally, install the [Cloud SDK](https://cloud.google.com/sdk)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Authenticate your Google Cloud account\n",
"\n",
"Depending on your Jupyter environment, you might have to manually authenticate. Follow the relevant instructions below.\n",
"\n",
"**BigQuery Studio** or **Vertex AI Workbench**\n",
"\n",
"Do nothing, you are already authenticated.\n",
"\n",
"**Local JupyterLab instance**\n",
"\n",
"Uncomment and run the following cell:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"PROJECT = \"bigframes-dev\" # replace with your project\n",
"# ! gcloud auth login"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Colab**\n",
"\n",
"Uncomment and run the following cell:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# from google.colab import auth\n",
"# auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set up your project"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Set your project and import necessary modules. If you don't know your project ID, see [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"PROJECT = \"\" # replace with your project\n",
"import bigframes\n",
"# Setup project\n",
"bigframes.options.bigquery.project = PROJECT\n",
"bigframes.options.display.progress_bar = None\n",
"\n",
Expand All @@ -84,8 +182,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. Create a BigFrames DataFrame and a Gemini model\n",
"Starting from creating a simple dataframe of several cities and a Gemini model in BigFrames"
"## Create a DataFrame and a Gemini model\n",
"Create a simple [DataFrame](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.dataframe.DataFrame) of several cities:"
]
},
{
Expand Down Expand Up @@ -162,6 +260,13 @@
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Connect to a Gemini model using the [`GeminiTextGenerator` class](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.llm.GeminiTextGenerator):"
]
},
{
"cell_type": "code",
"execution_count": 4,
Expand All @@ -186,8 +291,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Generate structured output data\n",
"Before, llm models can only generate text output. Saying if you want to know whether the city is a US city, for example:"
"## Generate structured output data\n",
"Previously, LLMs could only generate text output. For example, you could generate output that identifies whether a given city is a US city:"
]
},
{
Expand Down Expand Up @@ -273,9 +378,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The outputs are text results that human can read. But if want the output data to be more useful for analysis, it is better to transfer to structured data like boolean, int or float values. Usually the process wasn't easy.\n",
"The output is text that a human can read. However, if you want the output to be more useful for analysis, it is better to format the output as structured data. This is especially true when you want to have Boolean, integer, or float values to work with instead of string values. Previously, formatting the output in this way wasn't easy.\n",
"\n",
"Now you can get structured output out-of-the-box by specifying the output_schema parameter in Gemini model predict method. In below example, the outputs are only boolean values."
"Now, you can get structured output out-of-the-box by specifying the `output_schema` parameter when calling the Gemini model's [`predict` method](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.llm.GeminiTextGenerator#bigframes_ml_llm_GeminiTextGenerator_predict). In the following example, the model output is formatted as Boolean values:"
]
},
{
Expand Down Expand Up @@ -361,7 +466,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also get float or int values, for example, to get populations in millions:"
"You can also format model output as float or integer values. In the following example, the model output is formatted as float values to show the city's population in millions:"
]
},
{
Expand Down Expand Up @@ -447,7 +552,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"And yearly rainy days:"
"In the following example, the model output is formatted as integer values to show the count of the city's rainy days:"
]
},
{
Expand Down Expand Up @@ -533,10 +638,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. Generate all types of data in one prediction\n",
"You can get the different output columns and types in one prediction. \n",
"### Format output as multiple data types in one prediction\n",
"Within a single prediction, you can generate multiple columns of output that use different data types. \n",
"\n",
"Note it doesn't require dedicated prompts, as long as the output column names are informative to the model."
"The input doesn't have to be dedicated prompts as long as the output column names are informative to the model."
]
},
{
Expand Down Expand Up @@ -630,14 +735,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. Generate composite data types"
"### Format output as a composite data type"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Composite datatypes like array and struct can also be generated. Here the example generates a places_to_visit column as array of strings and a gps_coordinates as struct of floats. Along with previous fields, all in one prediction."
"You can generate composite data types like arrays and structs. The following example generates a `places_to_visit` column as an array of strings and a `gps_coordinates` column as a struct of floats:"
]
},
{
Expand Down Expand Up @@ -744,6 +849,36 @@
"result = gemini.predict(df, prompt=[df[\"city\"]], output_schema={\"is_US_city\": \"bool\", \"population_in_millions\": \"float64\", \"rainy_days_per_year\": \"int64\", \"places_to_visit\": \"array<string>\", \"gps_coordinates\": \"struct<latitude float64, longitude float64>\"})\n",
"result[[\"city\", \"is_US_city\", \"population_in_millions\", \"rainy_days_per_year\", \"places_to_visit\", \"gps_coordinates\"]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Clean up\n",
"\n",
"To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n",
"project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n",
"\n",
"Otherwise, run the following cell to delete the temporary cloud artifacts created during the BigFrames session:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"bpd.close_session()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next steps\n",
"\n",
"Learn more about BigQuery DataFrames in the [documentation](https://cloud.google.com/python/docs/reference/bigframes/latest) and find more sample notebooks in the [GitHub repo](https://github.com/googleapis/python-bigquery-dataframes/tree/main/notebooks)."
]
}
],
"metadata": {
Expand Down